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Abstract: We consider estimation in a class of semiparametric transformation 
models for right— censored data. These models gained much attention in sur- 
vival analysis; however, most authors consider only regression models derived 
from frailty distributions whose hazards are decreasing. This paper considers 
estimation in a more flexible class of models and proposes conditional rank 
M-estimators for estimation of the Euclidean component of the model. 



1. Introduction 

Semiparametric transformation models provide a common tool for regression analy- 
sis. We consider estimation in a class of such models designed for analysis of failure 
time data with time independent covariates. Let ft be the marginal distribution 
of a covariate vector Z and let H(t\z) be the cumulative hazard function of the 
conditional distribution of failure time T given Z . We assume that for /i-almost all 
z (/x a.e. z) this function is of the form 

(1.1) H(t\z)=A{Y{t),0\z) 

where V is an unknown continuous increasing function mapping the support of the 
failure time T onto the positive half-line. For fi a.e. z, A(x,9\z) is a conditional 
cumulative hazard function dependent on a Euclidean parameter 9 and having 
hazard rate a(x, 9\z) strictly positive at x = and supported on the whole positive 
half-line. Special cases include 

(i) the proportional hazards model with constant hazard rate a(x,9\z) = 



exp(0 T z) (Lehmann |23j, Cox [12|); 

(ii) transformations to distributions with monotone hazards such as the propor- 
tional odds and frailty models or linear hazard rate regression model (Bennett 
0, Nielsen et al. [28j], Kosorok et al. [22], Bogdanovicius and Nikulin 0); 

(hi) scale regression models induced by half-symmetric distributions (section 3) . 

The proportional hazards model remains the most commonly used transforma- 
tion model in survival analysis. Transformation to exponential distribution entails 
that for any two covariate levels z\ and z 2 , the ratio of hazards is constant in x 
and equal to a(x, 9\zi)/a(x, 9\z 2 ) — exp(6> T [zi — z 2 ]). Invariance of the model with 
respect to monotone transformations enstails that this constancy of hazard ratios is 
preserved by the transformation model. However, in many practical circumstances 
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this may fail to hold. For example, a new treatment (z\ = 1 ) may be initially bene- 
ficial as compared to a standard treatment {zi — 0), but the effects may decay over 
time, a(x, 9\z\ = l)/a(x, 9\zi = 0) J, 1 as x | oo. In such cases the choice of the pro- 
portional odds model or a transformation model derived from frailty distributions 
may be more appropriate. On the other hand, transformation to distributions with 
increasing or non-monotone hazards allows for modeling treatment effects which 
have divergent long-term effects or crossing hazards. Transformation models have 
also found application in regression analyses of multivariate failure time data, where 
models are often defined by means of copula functions and marginals are specified 
using models (1.1). 

We consider parameter estimation in the presence of right censoring. In the case 
of uncensored data, the model is invariant with respect to the group of increasing 
transformations mapping the positive half-line onto itself so that estimates of the 
parameter 9 are often sought within the class of conditional rank statistics. Except 
for the proportional hazards model, the conditional rank likelihood does not have a 
simple tractable form and estimation of the parameter 9 requires joint estimation of 
the pair (9,T). An extensive study of this estimation problem was given by Bickel 
Q|, Klaassen 21 1 and Bickel and Ritov [|. In particular, Bickel 0] considered 
the two sample testing problem, TIq : 9 = 9q vs TL : 9 > do, in one-parameter 
transformation models. He used projection methods to show that a nonlinear rank 
statistic provides an efficient test, and applied Sturm-Liouville theory to obtain the 
form of its score function. Bickel and Ritov 5] and Klaassen [2l[ extended this 
result to show that under regularity conditions, the rank likelihood in regression 
transformation models forms a locally asymptoticaly normal family and estimation 
of the parameter 9 can be based on a one-step MLE procedure, once a preliminary 
y/n consistent estimate of 9 is given. Examples of such estimators, specialized to 
linear transformation models, can be found in [EEIE!, 

among others. 

In the case of censored data, the estimation problem is not as well understood. 
Because of the popularity of the proportional hazards model, the most commonly 
studied choice of (1.1) corresponds to transformation models derived from frailty 
distributions. Murphy et al. [27| and Scharfstein et al. [3l| proposed a profile like- 
lihood method of analysis for the generalized proportional odds ratio models. The 
approach taken was similar to the classical proportional hazards model. The model 
(1.1) was extended to include all monotone functions T. With fixed parameter 9, an 
approximate likelihood function for the pair (9, T) was maximized with respect to 
r to obtain an estimate T n g of the unknown transformation. The estimate T n g was 
shown to be a step function placing mass at each uncensored observation, and the 
parameter 9 was estimated by maximizing the resulting profile likelihood. Under 
certain regularity conditions on the censoring distribution, the authors showed that 
the estimates are consistent, asymptotically Gaussian at rate y/n, and asymptoti- 
cally efficient for estimation of both components of the model. The profile likelihood 
method discussed in these papers originates from the counting process proportional 
hazards frailty intensity models of Nielsen et al. [28| . Murphy [2| and Parner [H 
developed properties of the profile likelihood method in multi-jump counting pro- 
cess models. Kosorok et al 22j extended the results to one-jump frailty intensity 
models with time dependent covariates, including the gamma, the lognormal and 
the generalized inverse Gaussian frailty intensity models. Slud and Vonta [33| pro- 
vided a separate study of consistency properties of the nonparametric maximum 
profile likelihood estimator in transformation models assuming that the cumulative 
hazard function (1.1) is of the form H(t\z) — A(exp[9 T z]T (t)) where A is a known 
concave function. 
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Several authors proposed also ad hoc estimates of good practical performance. 
In particular, Cheng et al. [II] considered estimation in the linear transforma- 
tion model in the presence of censoring independent of covariates. They showed 
that estimation of the parameter 9 can be accomplished without estimation of the 
transformation function by means of U-statistics estimating equations. The ap- 
proach requires estimation of the unknown censoring distribution, and does not 
extend easjly to models with censoring dependent on covariates. Further, Yang and 
Prentice 34j proposed minimum distance estimation in the proportional odds ratio 
model and showed that the unknown odds ratio function can be estimated based 
on a sample analogue of a linear Volterra equation. Bogdanovicius et al. con- 
sidered estimation in a class of generalized proportional hazards intensity models 
that includes the transformation model (1.1) as a special case and proposed a mod- 
ified partial likelihood for estimation of the parameter 9. As opposed to the profile 
likelihood method, the unknown transformation was profiled out from the likeli- 
hood using a martingale-based estimate of the unknown transformation obtained 
by solving recurrently a Volterra equation. 

In this paper we consider an extension of estimators studied by Cuzick (l3| and 
Bogdanovicius et al. 0, [l(| to a class of M-estimators of the parameter 9. In Sec- 
tion 2 we shall apply a general method for construction of M-estimates in semipara- 
metric models outlined in Chapter 7 of Bickel et al. [□]. In particular, the approach 
requires that the nuisance parameter and a consistent estimate of it be defined in a 
larger model P than the stipulated semiparametric model. Denoting by (X,5,Z), 
the triple corresponding to a nonnegative time variable X , a binary indicator 8 and 
a covariate Z, in this paper we take V as the class of all probability measures such 
that the covariate Z is bounded and the marginal distribution of the withdrawal 
times is either continuous or has a finite number of atoms. Under some regularity 
conditions on the core model {A(x, 9\z) : 9 € 6, x > 0}, we define a parameter Tp^g 
as a mapping of P x into a convex set of monotone functions. The parameter repre- 
sents a transformation function that is defined as a solution to a nonlinear Volterra 
equation. We show that its "plug-in" estimate Tp n ,e is consistent and asymptotically 
linear at rate y/n. Here P n is the empirical measure of the data corresponding to an 
iid sample of the (X, 5, Z) observations. Further, we propose a class of M-estimators 
for the parameter 9. The estimate will be obtained by solving a score equation 
U n {9) — or U n (9) — op(n -1 / 2 ) for 9. Similarly to the case of the estimator T n g, 
the score function U n (9) is well defined (as a statistic) for any P E P. It forms, 
however, an approximate V-process so that its asymptotic properties cannot be de- 
termined unless the "true" distribution P G V is defined in sufficient detail (Serfling 
(HI). The properties of the score process will be developed under the added assump- 
tion that at true P £ P, the observation (X, 5, Z) ~ P has the same distribution as 
(TAT,1(T < T), Z), where T and T represent failure and censoring times condi- 
tionally independent given the covariate Z, and the conditional distribution of the 
failure time T given Z follows the transformation model (1.1). 

Under some regularity conditions, we show that the M-estimates converge at 
rate y/n to a normal limit with a simple variance function. By solving a Fredholm 
equation of second kind, we also show that with an appropriate choice of the score 
process, the proposed class of censored data rank statistics includes estimators of 
the parameter 9 whose asymptotic variance is equal to the inverse of the asymptotic 
variance of the M-estimating score function y/nU n (9o). We give a derivation of the 
resolvent and solution of the equation based on Fredholm determinant formula. We 
also show that this is a Sturm-Liouville equation, though of a different form than 
in and 
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The class of transformation models considered in this paper is different than 
in the literature on nonparamctric maximum likelihood estimation (NMPLE); in 
particular, hazard rates of core models need not be decreasing. In section 2, the core 
models are assumed to have hazards a(x, 9, z) uniformly bounded between finite 
positive constants. With this aid we show that the mapping Tpj of "Px0 into the 
class of monotone functions is well defined on the entire support of the withdrawal 
time distribution, and without any special conditions on the probability distribution 
P. Under the assumption that the upper support point To of the withdrawal time 
distribution is a discontinuity point, the function Tpj is shown to be bounded. If 
r is a continuity point of this distribution, the function Vp t g(t) is shown to grow 
to infinity as t | t . In the absence of censoring, the model (1.1) assumes that the 
unknown transformation is an unbounded function, so we require Tp^ to have this 
property as well. In section 3, we use invariance properties of the model to show 
that the results can also be applied to hazards a(x, 9, z) which arc positive at the 
origin, but only locally bounded and locally bounded away from 0. All examples 
in this section refer to models whose conditional hazards are hyperbolic, i.e can be 
bounded (in a neighbourhood of the true parameter) between a linear function a+bx 
and a hyperbola (c + dx)^ 1 , for some a > 0, c > and b > 0, d > 0. As an example, 
we discuss the linear hazard rate transformation model, whose conditional hazard 
function is increasing, but its conditional density is decreasing or non-monotone, 
and the gamma frailty model with fixed frailty parameter or frailty parameters 
dependent on covariates. 

We also examine in some detail scale regression models whose core models have 
cumulative hazards of the form A (x exp[(3 T z}). Here A is a known cumulative 
hazard function of a half-symmetric distribution with density a . Our results apply 
to such models if for some fixed £ e [—1, 1] and r\ > 0, the ratio ao/g, g(x) = [1+r/x]^ 
is a function locally bounded and locally bounded away from zero. We show that this 
choice includes half-logistic, half-normal and half-t scale regression models, whose 
conditional hazards are increasing or non-monotone while densities are decreasing. 
We also give examples of models (with coefficient £ ^ [—1,1]) to which the results 
derived here cannot be applied. 

Finally, this paper considers only the gamma frailty model with the frailty pa- 
rameter fixed or dependent on covariates. We show, however, that in the case that 
the known transformation is the identity map, the gamma frailty regression model 
(frailty parameter independent of covariates) is not regular in its entire parameter 
range. When the transformation is unknown, and the parameter set restricted to 
i] > 0, we show that the frailty parameter controls the shape of the transformation. 
We do not know at the present time, if there exists a class of conditional rank statis- 
tics which allows to estimate the parameter 77, without any additional regularity 
conditions on the unknown transformation. 

In Section 4 we summarize the findings of this paper and outline some open 
problems. The proofs are given in the remaining 5 sections. 

2. Main results 

We shall first give regularity conditions on the model (Section 2.1). The asymp- 
totic properties of the estimate of the unknown transformation are discussed in 
Section 2.2. Section 2.3 introduces some additional notation. Section 2.4 consid- 
ers estimation of the Euclidean component of the model and gives examples of 
M-estimators of this parameter. 
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2.1. The model 

Throughout the paper we assume that (X, 8, Z) is defined on a complete probability 
space (CI, T, P), and represents a nonnegative withdrawal time (X), a binary indica- 
tor (8) and a vector of covariates (Z). Set N(t) = 1(X < t, 8 = 1), Y(t) = l(X>t) 
and let r = to(P) = sup{i : EpY(t) > 0}. We shall make the following assumption 
about the "true" probability distribution P. 

Condition 2.0. P e V where V is the class of all probability distributions such 
that 

(i) The covariate Z has a nondegenerate marginal distribution fi and is bounded: 
n(\Z\ < C) — 1 for some constant C. 

(ii) The function EpY(t) has at most a finite number of discontinuity points, and 
EpN(t) is either continuous or discrete. 

(iii) The point r > satisfies inf{i : E P [N(t)\Z = z] > 0} < r for \i a.e. z. In 
addition, r = t , if r is an discontinuity point of EpY(t), and r < r , if r 
is a continuity point of EpY(t). 

For given r satisfying Condition 2.0(iii), we denote by || • ||oc the supremum 
norm in £°°([0, r]). The second set of conditions refers to the core model {A(-,6\z) : 

9eO}. 

Condition 2.1. (i) The parameter set 9 C R d is open, and 9 is identifiable in 
the core model: 9^9' iff A(-, 9\z) ^ A(-, 9'\z) fj, a.e. z. 

(ii) For (i almost all z, the function A(-,9\z) has a hazard rate a(-,6\z). There 
exist constants < mi < TO2 < oo such that mi < a(x, 9\z) < m 2 for \i a.e. z 
and all 9 £ O. 

(iii) The function £(x, 9, z) — loga(x, 9, z) is twice continuously differentiable with 
respect to both x and 9. The derivatives with respect to x (denoted by primes) 
and with respect to 9 (denoted by dots) satisfy 

\£'(x, 9, z)\ < ip(x), \l"(x, 9, z)\ < $(x), 
\i(x,6,z)\<^(x), \i(x,6,z)\<^(x), 
\g(x,e,z)-g(x',e l ,z)\ < max(^(x), Mx'))[\x - x'\ + \0 - 6% 

where g = £,£' and £". Here xjj is a constant or a continuous bounded decreas- 
ing function. The functions £,£ and £' are locally bounded and ip p ,p = 1, 2, 3 
are continuous, bounded or strictly increasing and such that ip p (0) < oo, 

f-OO P oo 

/ e~ x ipl(x)dx < oo, / e~ x tp p (x)dx < oo, p — 2,3. 
Jo Jo 

To evaluate the score process for estimation of the parameter 9, we shall use the 
following added assumption. 

Condition 2.2. The true distribution P, P e V , is the same as that of (X, 6, Z) <~ 
(T A T, 1(T < T),Z), where T and T represent failure and censoring times. The 
variables T and T are conditionally independent given Z. In addition 

(i) The conditional cumulative hazard function of T given Z is of the form 
H(t\z) = A(T (t),9o\z) n a.e. z, where r is a continuous increasing func- 
tion, and A(x, 9q\z) = fjf a(u, 9o\z)du, 9o € Q, is a cumulative hazard function 
with hazard rate a(u,0o\z) satisfying Conditions 2.1. 



136 



D. M. Dabrowska 



(ii) If To is a discontinuity point of the survival function EpY(t), then To = 
sup{£ : P(f > t) > 0} < sup{i : P(T > t) > 0}. If t is a continu- 
ity point of this survival function, then To = sup{i : P(T > t) > 0} < 
swp{t : P(f > i) > 0}. 

For PeV, let = A P (t) be given by 

' ENp(du) 



(2.1) A(t) 



s P r(w) 



If the censoring time T is independent of covariates, then A(t) reduces to the 
marginal cumulative hazard function of the failure time T, restricted to the in- 
terval [0,To]. Under Assumption 2.2 this parameter forms in general a function of 
the marginal distribution of covariates, and conditional distributions of both failure 
and censoring times. Nevertheless, we shall find it, and the associated Aalen-Nelson 
estimator, quite useful in the sequel. In particular, under Assumption 2.2, the con- 
ditional cumulative hazard function H(t\z) of T given Z is uniformly dominated by 
A(t). We have 

A{t)= f E[a(T (u-),e ,Z)\X > u]T (du) 
Jo 

and 

H(dt\z) a(T (t-),e ,z) 



A(dt) Ea(T (t-),e ,Z)\X > i)' 



for t < t{z) = sup{< : EY(t)\Z = z > 0} and fj, a.e. z. These identities suggest to 
define a parameter Tpj as solution to the nonlinear Volterra equation 

EpN(du) 



(2.2) 

-L 

Jo 



o E P Y(u)a{T e (u-),e,Z) 

* Ap(du) 

E P a{Te{u-),9,Z)\X>u)'' 



with boundary condition Tpj)(0— ) = 0. Because Conditions 2.2 are not needed to 
solve this equation, we shall view T as a map of the set V x 9 into X = Li{X(P) : 
P e V}, where 

X(P) = {g : g increasing, e~ 9 e D(T),g < EpN.m^Ap < g < m^ 1 A P } 

and mi,m 2 are constants of Condition 2.1(iii). Here D(T) denotes the space of 
right-continuous functions with left-hand limits, and we choose T = [0, To], if t 
is a discontinuity point of the survival function EpY(t), and T = [0, To), if it is a 
continuity point. The assumption g -C EpN means that the functions g in X(P) 
are absolutely continuous with respect to the sub-distribution function EpN(t). 
The monotonicity condition implies that they admit integral representation g{t) = 
/q h{u)dEpN(u) and h > 0, iJpA^-almost everywhere. 



2.2. Estimation of the transformation 

Let (Ni,Yi, Zi), i — 1, . . . , n be an iid sample of the (N,Y,Z) processes. Set 
S(x, 6, t) = n^ 1 Y^i=i Yi(t)a(x, 6, Zj) and denote by S, S' the derivatives of these 
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processes with respect to 9 (dots) and x (primes) and let s,s,s' be the correspond- 
ing expectations. Suppressing dependence of the parameter Tp t g on P, set 

EN{du) 

o s 2 (r e (u-),0,u)' 



C e {t) 

For u < t, define also 

v e (u,t) = 7r (Utt] (i - s'(r e (w-),e, w )c (dw)), 

(2.3) = cxp[- / s'(T g (w-),9,w)Cg(dw)} if EN(t) is continuous, 



= \\ [1- s'(T g {w-),9,w)C g (dw)} if EN(t) is discrete. 

U<W<t 

Finally, we follow Bogdanovicius and Nikulin [9|, and use 

r^(t) = /' q(r N ( {du L v r nfl (o-) = o,e g e, 

to estimate the unknown transformation. Here N, = w _1 S™ =1 iVj. 

Proposition 2.1. Let P V be a distribution satisfying Conditions 2.0(i), (ii) 
and let (Xi,Si, Zi), i = 1, . . . ,n be an iid sample from this distribution. Suppose 
that Conditions 2.1 are fulfilled by the family {A(-,9\z) : 8 G O}, and let r be an 
arbitrary point such that Condition 2.0(iii) holds. 

(i) Equation (2.2) has a unique locally bounded solution satisfying < Tg(ro) < 
co i/ To = tq(P) is a discontinuity point of EYp(t) and 
limtioo Tg (t) = oo if To is a continuity point of this survival function. For any 
point t, the plug-in estimate {T n g(t) : t < t,9 G 9} satisfies sup ege ||r„g — 
Telloo - ► a.s. In addition, if tq is a continuity point of EpY(t), then 
sup{|exp(-r e ) -exp(-r„ e )|(t) : 9 G Q,t G T} = o P (l). 

(ii) The function 03 9^ {Tg{t) : t G [0,t]} G £°°([0,t]) is Frechet differentiable 
with respect to 9 and the derivative satisfies 

fg(t) = - f s{Tg{u~),9,u)Cg{du)Vg(u,t). 

Jo 

The estimate {T n g(t) : t < t,9 G 0} satisfies sup 0e Q ||r„9 — r^Hoo — * a.s. 

(iii) The process {W(t, 9) = -y/^-lTrie — r#](t) : i < r, 6* G 9} converges weakly in 
£°°{[0,t\ x 9) to 



0) = R(t, 0) - / i?(u-,6»)C e (du)Ve(u,i), 
J[o,t] 

where Vg{u,t) = \{u < t)s'(Tg(u—),9,u)Vg(u,t) and R{t,9) is a mean zero 
Gaussian process. Rs covariance function is given in Section 3. 
(iv) Let EpN(t) be continuous, and let 9q be an arbitrary point in 9. If 9 is a 
y/n- consistent estimate of it, then the process Wq — {Wo(t) : t < r}, Wq — 
Vn[T n§ - Tg -0- 9 )t d ] converges weakly in £°°([0, r]) to W = W{; 9 ). 

The proof of this proposition can be found in Section 6. 
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2.3. Some auxiliary notation 

From now on we assume that the function EN it) is continuous. We shall need some 
auxiliary notation. Define 

m , E{Yiu)[fa}iT e iu) : e,Z)} 
LJK ' ^ E{Yi{u)a(r e (u),0,Z)} ' 

where fix, 6, Z), is a function of covariates. Likewise, for any two such functions, /i 
and f 2 , let cov[/i,/ 2 ](M) = e[h^]{u,9) - (e[/i]e[/ 2 ] r )(u,0) and var[/](u,0) = 
cov[/, /](tt, 0). We shall write 

e(u,0) = e[f](u,0), e(u,0) = e[i](w,0), 

i>(u,0) = var[^](u,0), 0) = vai[$\(u, 0), p{u,0) = cov[i,l']{u,e), 

for short. Further, let 

rtAt' 



(2.4) 



rt/\t 

Koit,t') = / Ceidu)V e {u,t)Veiu,t') 
t 

Beit) = [ viu,6)ENidu) 
Jo 



and define 

(2.5) K 9 (t)= f [ Cgidu)Vgiu,t) 2 Bgidt). 

J J0<U<t<T 

This constant is finite for any point t satisfying the condition 2.0 (iii), but is in 
general infinite, if To is a continuity point of the survival function EY{t). Finally, 
we set 

v v (t,0) = v(t,9) + v{t,0)<pf 2 (t)-p{t,0)${t)-Mt)pW) T , 
p v it,0) = p(t,O)-v(t,0)tp e (t), 

for any function ipg square integrable with respect to Bg. Under the added condi- 
tion 2.2, we have 



e(u,0o) = E[i'iTg o iX),0 o ,Z)\X = u,5=l], 

viu,0 o ) = v&v [e'ir 0o ix),0 o ),z\x = u,S = 1], 

e(u,0 o ) = E[iiT 0o iX),0 o ,Z)\X = u,S=l], 

v(u,0 o ) = vav[iiTg o iX),0 o ,Z)\X = u,6 = l], 

piu,9 ) = covfe iX), O , Z),e'iTg iX), o ,Z)\X = u,S=l}. 

Similarly, 

v v (u, O ) = var[£(r 0o (X), O , Z) - tiTgjX), O , Z)yg iX)\X = u,6=l], 

p v iu,e ) = cov[iirg ix),6 ,z) 

-i'iT 0o ix)0 o ,z)<p eo ix),e'irg o ix),0 o ,z)\x=u,s=i}. 

However, e[/],var[/] and cov[/, g] form conditional expectation and variance-co- 
variance operators even when this assumption fails. This observation, the Cauchy- 
Schwarz inequality, and the monotone convergence theorem can be used to verify 
the next lemma. 
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Lemma 2.1. Suppose that Conditions 2.0 and 2.1 are satisfied. Let EN(t) be 
continuous, and let v(u,6) ^ a.e. EN. 

(i) If Kg(r ) < oo then the kernel Kg is square integrable with respect to Be- In 
addition, if e[(£)® 2 ](u,9) E L^EN) then tg e L 2 (Bg). 

(ii) Suppose that the integrability conditions of part (i) are satisfied. For any vector 
valued function <p$(t) = ggdTg e L<2{Bg) the matrices 

£ OiV ,(0,t) - / v v (t,9)EN(dt), 
Jo 

Vi, v (0,t) = £o. v (0,t)+ / Pv (t,6)[t e (t) + Mt)fEN(dt), 

Jo 

E2 >v ,(0,t) = £ o , v (0,t) 

+ / / Kg(t,u)p v (t,9)p v (u,9) T EN(du)EN(dt) 
Jo Jo 

have finite components for any point t < t . 

Here Li(EN) is the space of functions integrable with respect to EN and L2(Bg) 
is the space of functions square integrable with respect to Bg. 

Remark 2.1. For (pg = —Tg, we have Ei i¥ ,(#, r) = T, 2iV (9,t) if p_p{u,9) = 
and v(u,6) ^ a.e. £iV. If v(u,8) = 0, then for the sake of completeness, we 
define £i ;¥ ,(0, r) = E 2 ^(#,r) = Jl v(u,9)EN(du). In this case ^(x, 0, z) is a 
function not depending on covariates. In particular, in the proportional hazards 
model, we have, ^(x,#, z) = for all 9. In scale regression models with haz- 
ards a(x, 0, z) — exp(9 T z)cto(x exp(9 T z)), where ao is a known function, we have 
l\{x, 9, z) — a' (x)/ao(x) for 9 = (independence). 

We shall assume now that the point r satisfies the condition 2.0 (iii). With 
this choice, any function (pg(t) — J* ggdTg of bounded variation on [0, r] is square 
integrable with respect to the measure Bg, restricted to the interval [0, r]. However, 
in Proposition 2.3, we shall allow also for r = t to be a continuity point of the 
survival function EYp{t) and assume integrability conditions of Lemma 2.1. 

2-4- Estimation of the Euclidean component of the model 

To estimate the parameter 9, we use a solution to the score equation U n {9) = 
U niPn {9) = 0, where 

1 ™ f T 

(2.6) U nVn {6) = -Y d / [bu(T n g{t),t,9) - b 2l {T n g(t),t,9)p n g(t)}Ni(dt), 
n i=l J o 



b u (x,t,6) = l(x,6, - [S/S](x,6,t), 
b 2l (x,t,9) =l'(x,e,Zi)- [S'/S](x,0,t) 

and ip n e(t) is an estimate of a function ipg(t) = /J ggdTg. We shall make the follow- 
ing regularity assumption. 

Condition 2.3. Suppose that Conditions 2.0-2.2 hold, and let || • ||„ be the variation 
norm on the interval [0, r]. Let B(9 n ,e n ) = {9 : \9 — 6q\ < e n } for some sequence 
£ n I 0, \fne n — ► oo. In addition 
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(i) The matrix T,o,<p{&Oi T ) is positive definite. 

(ii) The matrix Tii jV (9o,t) is non-singular. 

(hi) The function (fg (t) = L gg dTg g satisfies ||v?0 o ||tj = ^(1), 

(iv) \\<p n 6 - ipe \\oo ->p and limsup n ||^„9 ||t, = O p (1). 

(v) We have either 

(v.l) ip ng - ip n0 > = (9 - 6')ip n e,e>, where 

limsup„sup{||Vw,0'lk -0,6' & B(9 ,e n )} = P (1) or 

(v. 2) limsup n sup{||^ ne ||^ : 9 e B(9 ,e n )} = P (l) and 
sup{||^„ e - Ve ||oo : G B(9 ,e n )} = o P (l). 

Proposition 2.2. Suppose that Conditions 2.3(i)-(iv) hold. 

(i) For any y/n consistent estimate 9 of the parameter 9$, Wo = y/n\T s — Tg — 
(9 — 9 )Tg] converges weakly in £°°([0, r]) to a mean zero Gaussian process 
Wq with covariance function cav(Wo(t), Wo(t')) = Kg a (t, t'). 

(ii) Suppose that Condition 2.3(v.l) is satisfied. Then, with probability tending 
to 1, the score equation U niPn (9) = has a unique solution 9 in B(9o,e n ). 
Under Condition 2.3(v.2), the score equation U niPn (9) = o P (n~ x / 2 ) has a 
solution, with probability tending to 1. 

(iii) Define [f,W ],f = xM(9 - 9 ), W = ^ n g ~ ^e„ -(8- )t § ], where 
9 are the estimates of part (ii) . Then [T, Wo] converges weakly in R p X 
£°°([0,r]) to a mean zero Gaussian process [T, Wo] with covariance covT = 
Z- 1 (9o,t)Z 2 (9 ,t)[Z- 1 (9 ,t)] t and 

cov(T,W (t)) = -^ 1 (9o,T) f Kg {t,u) Pv (u,9o)EN{du). 

Here the matrices Y> q . v ,q = 1,2 are defined as in Lemma 2.2. 

(iv) Let 9o be any yjn estimate, and let (p n = (p n g be an estimator of the function 
tpg such that \\ip n - ipg \\oa = op(l) and limsup n \\<p n \\v = P (1). Define 
a one-step M-estimator 9 = 9o + T,i^ n (9o, i~)~ 1 U ni j >n (9o), where Si,^ is the 
plug-in analogue of the matrix £i j¥ ,(#o, t). Then part (iii) holds for the one- 
step estimator 9. 

The proof of this proposition is postponed to Section 7. 

Example 2.1. A simple choice of the <pg function is provided by ipg = = cp n g. 
The resulting score equation, is approximately equal to 

1 n 

u n {o) = -V]fiv i (T)i(r n fl(x i ),fl,z i )-i(r n fl(x i AT),fl,2' i )l , 

n L J 

i—l 

and this score process may be easier to compute in some circumstances. If the 
transformation T had been known, the right-hand side would have represented the 
MLE score function for estimation of the parameter 9. Using results of section 5, 
we can show that solving equation U n {9) = or U n (9) = op(rt -1 / 2 ) for 9 leads to 
an M estimator asymptotically equivalent to the one in Proposition 2.2. However, 
this equivalence holds only at rate y/n. In particular, at the true 9o, the two score 
processes satisfy y/n\U n {9o) — U n {9o)\ = o P (l), but they have a different higher 
order expansions. 

Example 2.2. The second possible choice corresponds to ip$ — —Tg. The score 
function U n (9) is in this case approximately equal to the derivative of the pseudo- 
profile likelihood criterion function considered by Bogdanovicius and Nikulin Q in 
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the case of generalized proportional hazards intensity models. Using results of sec- 
tion 6, we can show that the sample analogue of the function Tg satisfies Conditions 
2.3(iv) and 2.3(v). 

Example 2.3. The logarithmic derivatives of £(x,9,Z) = \oga(x, 9, Z) may be 
difficult to compute in some models, so we can try to replace them by different 
functions. In particular, suppose that h(x, 9, Z) is a differcntiable function with 
respect to both arguments and the derivatives satisfy a similar Lipschitz continu- 
ity assumption as in condition 2.1. Consider the score process (2.6) with function 
(pg = and weights bu(x,t,9) = h(x,9,Zi) — [Sh/ S](x,6,t) where Sh(x,6,t) = 
J27=i Yi(u)[ha](x, 9, Zi), and ip n g = 0. For p = and p = 2, define matrices T,^ v by 
replacing the functions v v and p v appearing in matrices ~Eq<p and £2^ with 

v%9 ) = vax[h(T eo (X),e ,Z)\X = t,S = l}, 

P ';(t,9 ) = coy[h(Te (x i ),e ,z i ),e'(r eo (x i ),e ,z i )\x = t,s = i}. 

The matrix Y, 1v (9q,t) is changed to T,^ (6 ,t) = J p^ p {t,9 )EN(du), where the 
integrand is equal to 

cov[h{T eo (X), 9 , Z),£{Tg {X),9 , Z) + i'(Tg (X),8 , Z)tg (X)\X = t, 5 = 1]. 

The statement of Proposition 2.2 remains valid with matrices S P¥ , replaced by 
^p(pi P — 1; 2, provided in Condition 2.3 we assume that the matrix Sq v is positive 
definite and the matrix is non-singular. The resulting estimates have a structure 
analogous to that of the M-estimates considered in the case of uncensored data by 
Bickel et al. and Cuzick Alternatively, instead of functions li(x, 6, z) and 
£'(x,9,z), the weight functions bu and &2i can use logarithmic derivatives of a 
different distribution, with the same parameter 9. The asymptotic variance is of 
similar form as above. In both cases, the derivations are similar to Section 7, so we 
do not consider analysis of these score processes in any detail. 

Example 2.4. Our final example shows that we can choose the ipg function so that 
the asymptotic variance of the estimate 9 is equal to the inverse of the asymptotic 
variance of the normalized score process, \/nU n (9o). Remark 2.1 implies that if 
/9_p(w, 9q) = but v(u, 9q) ^ a.e. EN, then for ipg = —Tg the matrices Sg l¥ >, q — 
1,2 are equal. This also holds for v(u,9q) = 0. We shall consider now the case 
v(u,9q) ^ and p_j,(u,9 ) ^ a.e. EN, and without loss of generality, we shall 
assume that the parameter 9 is one dimensional. 
We shall show below that the equation 

<pg(t)+ [ Kg(t,u)v(u,9)(pg{u)EN(du) 

(2.7) J ° T 

= -t e (t)+ / Kg(t,u)p(u,9)EN(du) 
Jo 

has a unique solution ipg square integrable with respect to the measure (2.4). For 9 = 
9q, the corresponding matrices £i j¥ ,(#0) i~) and Yi2,cp{9o, T ) are finite. Substitution 
of the conditional correlation function p v (t,6o) = p(t,9o) — ipg (t)v(t,9o) into the 
matrix ^2.ip{9o, t) shows that they are also equal. (In the multiparameter case, the 
equation (2.7) is solved for each component of the 9). 

Equation (2.7) simplifies if we replace the function ipg by ipg — ipg + Tg. We get 



(2.8) 



1pg(t) - A / Kg(t, u)ljjg(u)Bg(du) = T]g(t), 

Jo 
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where A = — 1, 

%(*)=/ K {t,u)p_ t (u,6)EN(du), 
Jo 

P_p(m, 9) = v(u, 9)Tg(u)+p(u, 9) and Bg is given by (2.4). For fixed 9, the kernel Kg 
is symmetric, positive definite and square integrable with respect to Bg. Therefore it 
can have only positive eigenvalues. For A = — 1, the equation has a unique solution 
given by 

(2.9) Mt) = Ve(v)- f A (t,u,-l)ifc(u)Bo(du), 

Jo 

where Ag(t, u, A) is the resolvent corresponding to the kernel Kg. By definition, the 
resolvent satisfies a pair of integral equations 

Kg(t,u) = Ag(t,U,X)-X f Ag(t,W,\)Bg(dw)Kg(w,u) 

Jo 

= Ag(t, M, A) — A / Kg(t, w)Bg(dw)Ag(w, u, A), 
Jo 

where integration is with respect to different variables in the two equations. For 
A = — 1 the solution to the equation is given by 

Mt) = I K e (t,u)p_ t (u,6)EN(du) 
Jo 

- / Ag(t,w,-l)B e (dw) / K e (w 1 u)p_ t (u 1 9)EN(du) 
Jo Jo 

and the resolvent equations imply that the right-hand side is equal to 

(2.10) M*)= I A e (t,u,-l)p_ t (u,6)EN(du). 

Jo 

For 9 — 9 0} substitution of this expression into the formula for the matrices 
£i,y(#0j T ) and T,2 ylfi (9o,T) and application of the resolvent equations yields also 

£i )¥ ,(0o,t) = ^2, v (9o,t) 

v_ t (u,9 )EN(du) 



Jo 



f f Ag (t,u,-l)p_ t (u,9 )p_ t (t,9 Q ) T EN(du)EN(dt). 
o Jo 



It remains to find the resolvent Ag. We shall consider first the case of 9 = 9q. 
To simplify algebra, we multiply both sides of the equation (2.8) by Pe o (0, t) 1 — 
exp J s' (9o,Tg (u),u)Cg (du). For this purpose set 



i;(t) = Pfl (0,t)-V(*), G(t) =Vg o (0,t)- 1 Tg o (t), 

v(t,0 o ) - v(t,9 Q )Tg o (0,t) 2 , p_ 6 (t,0 o ) =Ve o (0,t)p_ t (t,e ), 

b(t) - / v{u,6 )dEN{u), c(t)= f Tg a (0,u)- 2 dCg a (u). 

Jo Jo 

Multiplication of (2.8) by ^(O,*)" 1 yields 

(2.11) i>(t)+ [ k(t,u)i>(u)b(du)= [ k(t,u)p_ d (u,9 Q )EN(du), 

Jo Jo 



Semiparametric transformation models 143 

where the kernel k is given by k(t, u) = c(tAu). Since this is the covariance function 
of a time transformed Brownian motion, we obtain a simpler equation. The solution 
to this Fredholm equation is 

(2.12) m= [ Mt,u)p_ d (u,e )EN(du), 

Jo 

where A(t, u) — A(t,u,—1), and A(t, u, A) is the resolvent corresponding to the 
kernel k. More generally, we consider the equation 

(2.13) 4>(t)+ [ k{t,u)i>(u)b(du)=rj(t). 

Jo 

Its solution is of the form 



ip(t) = fj(t) - [ A(t,u)b(du)fj(u). 
Jo 



To give the form of the A function, note that the constant kb (t) defined in (2.5) 
satisfies 



k(t) = k 9o (t) = / c(u)b(du). 
Jo 



Proposition 2.3. Suppose that Assumptions 2.0(i) and (ii) are satisfied and 
v(u, 9q) ^ ; For j = 0,1,2,3, n > 1 and s < t define interval functions ^j(s, t) — 
E^oVM) as follows: 

*oo(»,*) - 1, *«>(»,*) = 1, 

*o«(s,t) = / / *o,n-i(s,ui-)c(dui)&(dti2) n>l, 

J J S<U\<U2<t 



*i„(s,i) = / & 0n (s,u-)c{du) n > 0, 

*2n(*,*) = / / 6(d«l)c(d« 2 )*2,n-l(«,*) 

«/ J 8<Ul <U2<t 



/ 6(dMi)*i, n _i(u,i-) n>l, 

J[s.t) 

I 

J[s,t 



;t) 

*3n(M) = / #2n(s,«)&( dt0 ) « > 0. 



For j — 2,3, define j n (s,t+) by replacing the intervals [s,t) with [s,t] in the last 
two lines, and similarly, define ^j n (s,t—) by replacing intervals (s,t\ with (s,t) in 
the first two definitions. For s > t, set ^j(s, t) = 0, j = 0, 1, 2, 3 and let *&jo{t, t) = 1 
for j = 0,2, *io(M) = c(At), * 30 (M) = b(At), and ^ jn {t,t) = for n > 1, 
.7 = 0,1,2,3. 

(i) We have 



(«,*] J(s,t] 



*o(M) = 


1 + 


*i(s,t) = 


/ J 

J(s,t] 


*2(*,*) = 


1 + 


*3(*,*) = 


/ J 

J[s,t) 



V (s,u-)c(du) = / c(du)* 2 («,t+), 
(».*] 



^ 2 (s,u)b(du) = I b(du)Vo(u,t-). 
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For any point r satisfying Condition 2.0(iii), = 0, 1,2,3 form bounded 

monotone increasing interval functions. In particular, ^o(s,t) < exp«:(r) and 
^i{s,t) < i&o(s,t)[c(t) — c(s)]. In addition if tq is a continuity point of the 
survival function EpY(t) and k(tq) < oo, then ^o(s,t) < expre(ro) for any 
< s < t < To, while the remaining functions are locally bounded. 

(ii) Suppose that r satisfies Condition 2.0(iii), or else r = tq, t$ is a continuity 
point of EpY{t) and k(tq) < oo. The resolvent of the kernel k is given by 
A(s,t,-1) = A(«,i) = *o(0,r)- 1 * 1 (0,sAi)* o (sVt,r). 

(iii) Under assumptions in (ii), for any fj £ Lzib), the solution to equation (2.13) 
satisfies ip S L2{b), and and \\ip\\2 < ll^lbfl + ^o(0, tq)k(to)], where \\ ■ 1 1 2 is 
the L2 norm with respect to the measure b. 

(iv) Suppose that r satisfies Condition 2.0(iii). If fj is a bounded function or a 
function of bounded variation, then the solution ?p has the same properties and 
the bounds of part (ii) hold in supremum and variation norm, respectively. 

(v) The solution to equation (2.7) is given by 

(f0 o (t) = -tg (t) 

+ [ A(t,u) P _ t (u,e )EN(du)Ve o (0,u)Ve o (0,t). 
Jo 

Under assumptions of part (ii) and integrability conditions of Lemma 2.2, we 
have tp € L2(Bg ). We also have, fe {t) = L ge dTg , where 

ge (u) = [s/s](r So (u),0o,u)-[s'/s](r eo (u),0o,u)(pe o (u) 

-s^H^o.m)- 1 I Vg o {u,t) Pip {u,0 o )EN{du), 

J u 

(vi) If t satisfies Condition 2.0(iii), then the solution (p is a function of bounded 
variation. Moreover, the constant W = ^(0, r) satisfies W — ^i(0,t) x 
^ 3 (t, t) + * (0, t)^ (t, t) for any <t < r, and 

ipg (t) = f A(t, u)p(u, 9 )EN(du)Vg a (0, u)Vg o {0, t) 
Jo 

+ [ A(i, u)s(T {u), 9 , u)c(du)Ve (0, u)Vg a (0, t), 
Jo 

where A(t,u) = W~ x [*o(0, u A t)V (u V t, r) - f i(0,!iA()f 3 (M V t, r)]. 
The proof of this proposition is given in Section 8. 

We have chosen to transform equation (2.8) in order to simplify calculations. The 
resolvent of the kernel K corresponding to equation (2.8) can be obtained based on 
recurrent Fredholm determinant formulas |25| applied to the kernel K . The same 
arguments can be applied to find the solution to equation (2.8) for 9 ^ 9 . The only 
difference is that the kernel function Kg(t,u) does not represent the asymptotic 
covariance function of the process ^/n[T n e — Tg] for such 9 points. 

The sample analogue of the function i\)g can be obtained in several different 
manners. Firstly, equations (2.7)-(2.8) can be solved directly by plugging in sample 
analogues of the functions K,v, p etc. If these sample analogues are functions placing 
mass at each uncensored observation, then this choice is not convenient, because to 
solve the equation one must eventually invert an m x to dimensional matrix (here to 
is the number of uncensored observations in the sample). Proposition 2.3 provides 
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a simpler form of this equation. Define estimates 

c n e(t) = / V n e(0,uy 2 C n o(du), 
Jo 

M*) = / ^ne(0,u) 2 B„e(^), 
Jo 

C„ 9 (du) = / S(T ne (u~),8,u)- 2 N.(du), 
Jo 

V ne (u,t) = ex P [- / S'(T e (u-),e,u)C n e(du)], 



and let be the plug-in analogue of the formula (2.4). Let X*^ < ■■■ < XT. 
be the distinct ordered uncensored observations in the sample. Then the discrete 
version of equations (2.11)-(2.13) is given by 

m 

$*e(xy + ^c ne (X{ i) A X^bnoiAXfaWneiXfa) = 
i=i 

Using Proposition 2.3, we have shown in an earlier version of this text that finding 
solution to this equation amounts to inversion of a bandsymmetric tridiagonal ma- 
trix which can be easily implemented in practice. A numerical example is given in 
[lij ]. The formula (2.14) in part (v) gives in this case an estimate of the function ipg 
corresponding to the equation (2.7). We show in [17] that it satisfies Conditions 2.3. 

Finally, we show that in the continuous case, equation (2.11) corresponds to 
a Sturm-Liouville equation. Suppose that the point t satisfies Condition 2.0(iii). 
By twice "differentiating" (2.11) with respect to dc, we obtain a Sturm-Liouville 
equation 

d r d ~. . . ~, .db , . _ , „ , EN , , 

with boundary conditions -0(0) = 0, -^.ip{t)\ t=T = 0. Its solution is of the form 
(2.12) with A representing the Green's function associated with the homogeneous 
equation 

(2-14) |[^] (t)= ^ (t )| (t) 

and boundary conditions "0(0) = 0, 4z${r) = 0. The Green's function is given 
by A(t, u) — W /_1 [-0i(t A u)ipo(t V u)], where tpi and ipo is a pair of fundamental 
solutions, "01 corresponding to the left boundary (V>i(0) = 0) and ipo corresponding 



to the right-boundary (4-ipo( r ) — 0)- Moreover, 



w = -[Mt)^o(t)-Mt)-^- c Mt)} 

is the negative Wronskian (the right-hand side is a constant, not depending on t). By 
twice integrating the homogeneous equation subject to the boundary conditions, we 
obtain a pair of Volterra equations whose solutions are ai^i(0, t) and a v I / o(^ T ) I 
where a p ^ are arbitrary constants. The choice of a p — 1, corresponds to the 
Volterra equations for ^ (t,T) and ^' 1 (0,t) discussed in part (i). We also have 
W = a a 1 $o(0, t) = a aiW. Thus the A function of Proposition 2.3 is the Green's 
function of this Sturm-Liouville equation. Note that this is a different equation 
than in Bickel Q and Bickel et al. @. In particular, it derives its form from the 
covariance function of a time transformed Brownian motion, rather than Brownian 
Bridge. 
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3. Examples 

In this section we assume the conditional independence Assumption 2.2 and discuss 
Condition 2.1(h) in more detail. It assumes that the hazard rate satisfies mi < 
a(x, 6, z) < mi /i a.e. z. This holds for example in the proportional hazards model, 
if the covariates are bounded and the regression coefficients vary over a bounded 
neighbourhood of the true parameter. Recalling that for any P E V , X(V) is the 
set of (sub)-distribution functions whose cumulative hazards satisfy m^ 1 A < g < 
m^ 1 A and A is the cumulative hazard function (2.1), this uniform boundedncss 
is used in Section 6 to verify that equation (2.2) has a unique solution which is 
defined on the entire support of the withdrawal time distribution. This need not be 
the case in general, as the equation may have an explosive solution on an interval 
strictly contained in the support of this distribution f(20j). 

We shall consider now the case of hazards a(x, 8, z) which for \i almost all z 
are locally bounded and locally bounded away from 0. A continuous nonnegative 
function / on the positive half-line is referred to here as locally bounded and locally 
bounded away from 0, if /(0) > 0, lim^oo f(x) exists, and for any d > there 
exists a finite positive constant k = k(d) such that fc _1 < f(x) < k for x & [0, d]. In 
particular, hazards of this form may form unbounded functions growing to infinity 
or functions decaying to as x \ oo. 

To allow for this type of hazards, we note that the transformation model as- 
sumes only that the conditional cumulative hazard function of the failure time T 
is of the form H[t\z) = A(T(t), 0\z) for some unspecified increasing function T. We 
can choose it as T — ^(r), where $ is a known increasing differentiable function 
mapping positive half-line onto itself, $(0) = 0. This is equivalent to selection of 
the reparametrized core model with cumulative hazard function A($>(x),8\z) and 
hazard rate a($>(x),6\z)(p($>(x)),(p — If in the original model the hazard rate 
decays to or increases to infinity at its tails, then in the reparametrized model the 
hazard rate may form a bounded function. Our results imply in this case that we 
can define a family of transformations Tg bounded between m^ 1 A(t) and m^[ 1 A(t), 
This in turn defines a family of transformations Tg bounded between <3> _1 (m^ 1 A(t)) 
and $ _1 (m^ A(t)). More generally, the function $ may depend on the unknown 
parameter 6 and covariates. Of course selection of this reparametrization is not 
unique, but this merely means that different core models may generate the same 
semiparametric transformation model. 

Example 3.1. Half-logistic and half- normal scale regression model. The assump- 
tion that the conditional distribution of a failure time T given a covariate Z has 
cumulative hazard function H(t\z) — Ao(T(t) exp[0 T z]), for some unknown increas- 
ing function T (model I), is clearly equivalent to the assumption that this cumulative 
hazard function is of the form H(t\z) = Ao(AQ 1 (T(t)) exp[9 T z\), for some unknown 
increasing function T (model II) . The corresponding core models have hazard rates 



(3.1) 



model I: 



a(x,6,z) = e eTz a (xe eTz ) 



and 



(3.2) 



model II: a(x, 9, z) = e 



g Tz ao(A^(xy T *) 



respectively. In the case of the core model I, Condition 2.1(h) is satisfied if the co- 
variates are bounded, 9 varies over a bounded neighbourhood of the true parameter 
and «o is a hazard rate that is bounded and bounded away from 0. An example is 
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provided by the half- logistic transformation model with ao(x) = 1/2 + tanh(x/2). 
This is a bounded increasing function from 1/2 to 1. 

Next let us consider the choice of the half-normal transformation model. The 
half-normal distribution has survival function Fq(x) = 2(1 — $(x)), where $ is the 
standard normal distribution function. The hazard rate is given by 

, J°°F (u)<fa 
a °W= x + F (x) ■ 

The second term represents the residual mean of the half normal distribution, and 
we have ao(x) = x + ^ Q (x). The function olq is increasing and unbounded so that 
the Condition 2.1(h) fails to be satisfied by hazard rates (3.1). On the other hand 
the reparameterized transformation model II has hazard rates 

tt( *,M=^ vw r +<t(e °>~' w) - 

It can be shown that the right side satisfies exp(9 T z) < a(x, 9, z) < exp(29 T z) + 
exp(9 T z) for exp(6> T z) > 1, and exp(26' T z)(l + exp(6' T z))- 1 < a{x,6,z) < exp(9 T z) 
for exp(9 T z) < 1. These inequalities are used to verify that the hazard rates of the 
core model II satisfy the remaining conditions 2.1 (ii). 

Condition 2.1 assumes that the support of the distribution of the core model 
corresponds to the whole positive half-line and thus it has a support independent 
of the unknown parameter. The next example deals with the situation in which this 
support may depend on the unknown parameter. 

Example 3.2. The gamma frailty model 3, 28 1 has cumulative hazard function 



G(x, 9\z) = - log[l + T)Xff% 9 = (rj, 0), T]>0, 
V 

= xe pTz , r] = Q, 

= - log[l + rjxeJ iTz ], for rj < and - 1 < r\e pTz x < 0. 
V 

The right-hand side can be recognized as inverse cumulative hazard rate of Gom- 
pertz distribution. 

For i] < the model is not invariant with respect to the group of strictly increas- 
ing transformations of R + onto itself. The unknown transformation T must satisfy 
the constraint —1 < ?7exp(/3 T z)r(i) < for fi a.e. z. Thus its range is bounded 
and depends on (rj, (3) and the covariates. Clearly, in this case the transformation 
model, assuming that the function T does not depend on covariates and parameters 
does not make any sense. When specialized to the transformation T(t) = t, the 
model is also not regular. For example, for r\ = — 1 the cumulative hazard func- 
tion is the same as that of the uniform distribution on the interval [0, exp(— (3 T Z)]. 
Similarly to the uniform distribution without covariates, the rate of convergence of 
the estimates of the regression coefficient is n rather than y/n. For other choices of 
the fj = —rj parameter, the Hcllinger distance between densities corresponding to 
parameters f3\ and 02 is determined by the magnitude of 

E z l(h T Z > 0)[1 - rjexp(-h T Z)] 1/n + E z l(h T Z < 0)[1 - fj exp(h T Z)f hl , 

where h = @2 — Pi- After expanding the exponents, this difference is of order 
0{Ez\hZ\ 1 / f ') so that for fj < 1/2 the model is regular, and irregular for fj > 1/2. 
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For j] > 0, the model is Hellingcr differcntiable both in the presence of co- 
variates and in the absence of them ((3 — 0). The densities are supported on the 
whole positive half-line. The hazard rates are given by g{x,9\z) = exp(/3 T z) [1 + 
■q exp([3 T z)x]~ x . These are decreasing functions decaying to zero as x | oo. Using 
Gompertz cumulative hazard function G~ 1 (x) = i]^ 1 [e r,x — 1] to reparametrize the 
model, we get A(x,6\z) = GiG^ix)^^) = r?" 1 log[l+(e ? ' a: -l) exp(/3 T z)]. The haz- 
ard rate of this model is given by a(x, 6\z) = exp(/3 T z+?72;)[l + (e I ' :l: — 1) exp(/3 T z)] _1 . 
Pointwise in /3, this function is bounded between max{cxp(e/3 T Z), 1} and from be- 
low by min{exp(/3 T Z), 1}. The bounds are uniform for all r/ e [0, oo) and the 
reparametrization preserves regularity of the model. 

Note that the original core model has the property that for each parameter 
r], t] > it describes a distribution with different shape and upper tail behaviour. 
As a result of this, in the case of transformation model, the unknown function T is 
confounded by the parameter rj. For example, at r\ = 0, the unknown transformation 
T represents a cumulative hazard function whereas at 77 = 1, it represents an odds 
ratio function. For any continuous variable X having a nondefectivc distribution, 
we have ET(X) = 1, if T is a cumulative hazard function, and ET(X) = 00, if 
T is an odds ratio function. Since an odds ratio function diverges to infinity at a 
much faster rate than a cumulative hazard function, these are clearly very different 
parameters. 

The preceding entails that when r/, rj > 0, is unknown we are led to a constrained 
optimization problem and our results fail to apply. Since the parameter r\ controls 
the shape and growth-rate of the transformation, it is not clear why this parame- 
ter could be identifiable based on rank statistics instead of order statistics. But if 
omission of constraints is permissible, then results of the previous section apply so 
long as the true regression coefficient satisfies /3o 7^ and there exists a preliminary 
•^/n-consistent estimator of 9. At /3o = 0, the parameter r\ is not identifiable based 
on ranks, if the unknown transformation is only assumed to be continuous and com- 
pletely specified. We do not know if such initial estimators exist, and rank invariancc 
arguments used in [14] suggest that the parameter r\ is not identifiable based on 
rank statistics because the models assuming that the cumulative hazard function is 
of the form r\^ x log[l + c?7exp(/3 T z)r(i)] and 77 _1 log[l + cxp(f3 T z)T(t)}, c > 0, 77 > 
all represent the same transformation model corresponding to log-Burr core model 
with different scale parameter c. Because this scale parameter is not identifiable 
based on ranks, the restriction c = 1 does not imply, that 77 may be identifiable 
based on rank statistics. 

The difficulties arising in analysis of the gamma frailty with fixed frailty param- 
eter disappear if we assume that the frailty parameter 77 depends on covariates. 
One possible choice corresponds to the assumption that the frailty parameter is of 
the form rj(z) = cxp£ T z. The corresponding cumulative hazard function is given 
by exp[— £ T z] log[l + cxp(£ T z + f3 T z)r(t)]. This is a frailty model assuming that 
conditionally on Z and an unobserved frailty variable U, the failure time T follows 
a proportional hazards model with cumulative hazard function UT(t) exp(/3 T Z), 
and conditionally on Z, the frailty variable U has gamma distribution with shape 
and scale parameter equal to exp(£ T z). 

Example 3.3. Linear hazard model. The core model has hazard rate h(x, 6\z) = 
ag(z) + xbg(z) where ag(z),bg(z) arc nonnegative functions of the covariates de- 
pendent on a Euclidean parameter 9. The cumulative hazard function is equal to 
H(t\z) = a$(z)t + bg(z)t 2 /2. Note that the shape of the density depends on the 
parameters a and b: it may correspond to both a decreasing and a non-monotone 
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function. 

Suppose that bg(z) > 0, ag(z) > 0. To reparametrize the model we use G~ 1 (x) = 
[(1+2X) 1 / 2 — 1]. The reparametrized model has cumulative hazard function A(x, 0\z) 
= H(G~ 1 (x),6\z) with hazard rate a(x,6,z) = a e {z)(l + 2a;)- 1 / 2 + b e (z)[l - 
(1 + 2x)~ 1 l 2 \. The hazard rates are decreasing in x if a$(z) > b$(z), constant 
in x if a$(z) = bg{z) and bounded increasing if ag(z) < b$(z). Pointwisc in z 
the hazard rates are bounded from above by m&x{ag(z), bg(z)} and from below 
by min{ag(z),bg(z)}. Thus our regularity conditions are satisfied, so long as in 
some neighbourhood of the true parameter 9q these maxima and minima stay 
bounded and bounded away from and the functions ag,bg satisfy appropriate 
differentiability conditions. Finally, a sufficient condition for idcntifiability of pa- 
rameters is that at a known reference point zq in the support of covariates, we have 
a g {z Q ) = 1 = be(z ), 9 e and 

[clq{z) = a$'(z) and bg(z) = bg>(z) fi a.e. z] => 9 = 9'. 

Returning to the original linear hazard model, we have excluded the boundary 
region a$(z) = or bg(z) = 0. These boundary regions lead to lack of identifiability. 
For example, 

model 1: ag(z) = /j a.e. z, 
model 2: bg(z) = \i a.e. z, 
model 3: ag(z) — cbg(z) \i a.e. z, 

where c > is an arbitrary constant, represent the same proportional hazards 
model. The reparametrized model does not include the first two models, but, de- 
pending on the choice of the parameter 9, it may include the third model (with 
c=l). 

Example 3.4. Half-t and polynomial scale regression models. In this example we 
assume that the core model has cumulative hazard Ao(x exp[9 T z]) for some known 
function Aq with hazard rate ao- Suppose that c\ < exp(9 T z) < for [i a.e. z. 

For fixed £ > — 1 and 77 > 0, let G -1 be the inverse cumulative hazard function 
corresponding to the hazard rate g(x) — [1 + rjx}^. If ao/g is a function locally 
bounded and locally bounded away from zero such that lim^^QQ a (x)/g(x) = c for 
a finite positive constant c, then for any e e (0, c) there exist constants < mi(e) < 
m2(e) < 00, such that the hazard rate of Aq(G^ 1 (x) exp[9 T z]) is bounded between 
mi(e) and iri2(e). Indeed, using c\ < exp(9 T z) < C2 and monotonicity properties 
of the function g(x), we can find finite positive constants bi 7 b 2 such that bi < 
e e z g(xexp[9 T z])/g(x) < 62 for /i a.e. z and x > 0. The claim follows by setting 
7711(e) = bi max(c — e, fc -1 ) and 777,2 (s) = ^2 min(c + e, fc), where k — k(d), k > and 
d > are such that c— e < ao(x)/g(x) < c+e for x > d, and fc -1 < ao(x)/g(x) < k, 
for x < d. 

In the case of half-logistic distribution, we choose g(x) = 1. The function g(x) = 
l+x applies to the half-normal scale regression, while the choice g(x) = (l+n- 1 ^)- 1 
applies to the half-t„ scale regression model. Of course in the case of gamma, inverse 
Gaussian frailty models (with fixed frailty parameters) and linear hazard model the 
choice of the g(x) function is obvious. 

In the case of polynomial hazards ao(x) = 1 + X^Li o-pX p , m > 1, where a p are 
fixed nonnegative coefficients and a m > 0, we choose g(x) = [1 + a m x] m . Note 
however, that polynomial hazards may be also well defined when some of the coeffi- 
cients a p are negative. We do not know under what conditions polynomial hazards 
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define regular parametric models, but we expect that in such models parameters 
are estimated subject to added constraints in both parametric and semiparametric 
setting. Evidently, our results do not apply to such complicated problems. 

The choice of g(x) = [1 + rjx]^,^ < — 1 was excluded in this example because it 
forms a defective hazard rate. Gronwall's inequalities in [3] show that hazard rates 
of the form exp(9 T z)[l + x exp(9 T z)}^ , £ < —1, lead to a Volterra equation whose 
solution is on a finite interval dependent on 9 which may be strictly contained in 
the support of the withdrawal time distribution. Our results do not apply to this 
setting. 

4. Discussion 

In Section 2 we discussed properties of the estimate of the unknown transformation 
under no special regularity on the model representing the "true" distribution P of 
the data. Examples of Section 3 show that the class of transformation models to 
which these results apply is quite large and allows hazards of core models to have 
a variety of shapes. 

To estimate the unknown Euclidean 9 parameter, we made the additional as- 
sumption that the failure and censoring times are conditionally independent given 
the covariates and the failure times follow the transformation model. These con- 
ditions are sufficient to ensure that the score process is asymptotically unbiased, 
and the solution to the score equation forms a consistent estimate of the "true" 
parameter 9q. However, only first two moment characteristics of certain stochastic 
integrals are used for this purpose in Section 7, so that the results may also be valid 
under different assumptions on the true distribution P. 

We also showed that the class of M-estimators includes a special choice corre- 
sponding to an estimate whose asymptotic variance is equal to the inverse of the 
asymptotic variance of the score function y^nU n (9o). In [ljj we show that this es- 
timate is asymptotically efficient. Therein we discuss alternative ad hoc estimators 
of the unknown transformation and consider a larger class of M-estimators, allow- 
ing to adjust common inefficient estimates of the 9 parameter to efficient one-step 
MLE estimates. Note that asymptotic variance of an M-estimator is usually of a 
"sandwich" form : As. var y/n0-6 o ) = A- 1 (9 ) As. varV^L/(6» )[A T (6» )]- 1 , where 
A{9) is the limit in probability of the derivative of U n {9) with respect to 9. How- 
ever, it is quite common that estimators derived from conditional likelihoods of 
type (2.6) satisfy A(9q) — As. var ^/nU n (9o) but are inefficient, so that results of 
Proposition 2.3 do not imply asymptotic efficiency of the corresponding estimate 
of the parameter 9. 

The proofs of Propositions 2.1 and 2.2 are based on empirical and U-process 
techniques and are given in Sections 6 and 7. The next section collects some auxil- 
iary results. The proof of consistency and weak convergence of the estimate of the 
unknown transformation relies also on Gronwall's inequalities collected in Section 9. 
The proof of Proposition 2.3 uses Fredholm determinant formula for resolvents of 
linear integral equations |25j |. 

5. Some auxiliary results 

We denote by P n = n~ 1 Y^i=i£x i ,8i,z i the empirical measure corresponding to a 
sequence of n iid observations (A^, Si, Zi) representing withdrawal times, censor- 
ing indicators and covariates. Set N.if) — rT 1 < t,5i — 1), Y(t) — 
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n_1 SlLi — *) an d Further, let |j • || be the suprcmum norm in the set 
£°°([0,r] x 9), and let || • ||oo be the supremum norm £°°([0,r]). We assume that 
the point r satisfies Condition 2.0 (iii). 
Define 

N(du) f* EN(du) 



Rn{t,6) Jo S(T e (u-),e,u) J Q s(Te(u-),6,uy 

R pn (t,e) = [ h(T e {u-),e,u)[N-EN]{du), p = 5,6, 
Jo 

R pn {t,8) = [ H n (T (u-),6,u)N{du) 
Jo 

-[ h{T (u-),e,u)EN(du), p = 7,8, 
Jo 

Rdn{t,e) = [ EN(du)\ [ Vg{u,w)R 5n {dw,9)\, 

J[0,t) J(u,t] 

Rwn(t,0) = / y/nR n (u-,6)R 5n (du,6), 
Jo 

R pn {t,9) = ( H n (T ne (u-),e,u)N.(du) 



-[ h(T e (u-),6,u)EN(du), p= 11, 12, 
Jo 

B pn (t,8) = / 9)R n (du,6), p=l,2, 

Jo 

where Vg(u,w) is given by (2.3). In addition, H n = K' n for p = 7 or p = 11, 
H n = K n for p = 8 or p = 12, h = k' for p = 5, 7 or p = 11, and h = k for p = 6, 8 
or 12. Here k' = -[s'/s 2 ], k = -[s/s 2 ], K' n = -[S'/S 2 ], K n = -[S/S 2 ]. Further, 
set F ln (u,0) - [S-eS](T e (u),e,u) andF 2 „( u ,0) - [S' - eS'](T e (u), 9, u). 

Lemma 5.1. Suppose that Conditions 2.0 and 2.1 are satisfied. 

(i) y/nR n (t 7 9) converges weakly in ^°°([0, r] x 9) to a mean zero Gaussian process 
R whose covariance function is given below. 

(ii) \\Rpn\\ —> a.s., for p = 5, . . . , 12. 

(iii) v^||-Bpn|| — > a.s. for p = 1,2. 

(iv) TTie processes V n (re(t—) 7 9 7 t) and V n (Tg(t),9,t), where V n = S/s — 1 satisfy 
\\V n \\ = 0(b n ) a.s. In addition, \\V n \\ — » a.s. /or y„ = [5" — s']/s, [5"' — 
s"]/s, [5 1 - s]/s, [S 1 - s]/s [S" - s']/s. 

Proof. The Volterra identity (2.2), which defines r# as a parameter dependent on 
P, is used in the foregoing to compute the asymptotic covariance function of the 
process R\ n . In Section 6 we show that the solution to the identity (2.2) is unique 
and, for some positive constants do,di,d 2 , we have 

T e {t) < d A P (t), \T e (t) - IV (i) | <\9- 0'|di cx P [d 2 A P (t)], 
(5.1) \T (t)-Tg(t')\ < d \A P (t) - A P (t')\ 

^ T^r^P(X e(tAt>,tVt'],6=l), 
EpY (t) 

with similar inequalities holding for the left continuous version of Tg = Tg^p. Here 
Ap(t) is the cumulative hazard function corresponding to observations (X, 6). 
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To show part (i), we use the quadratic expansion, similar to the expansion of the 



ordinary Aalen-Nelson estimator in [19j. We have R n — R 



j» i 



Rln(t, 0) 



Ni{du) 



Si 



R2n(t, 0) 
Rin{t,0) = 



-Y , 

n ~{Jo [s(T e (u~),9 7 u) s 

n 
i=l 

Zly- t* (Si- a 

n 2 ^ L 



(T (u-),e,u)EN(du) 



1 

— T 

„2 



S- 



(r fl (u-),e,u) 



(T e (u-),6,u)[N 3 - EN 3 ](du), 
(r fl (u-),fl,u)[JVi-£?JVi](du), 



's(ry w -),0, u )' 

where S^T^u-), 0, u) = Fi(u)a(r e (M-), 6», 

The term i? 3n has expectation of order 0(n~ 1 ). Using Conditions 2.1, it is easy 
to verify that R 2n and n[R 3n — ER 3n ] form canonical U-processes of degree 2 
and 1 over Euclidean classes of functions with square integrable envelopes. We 
have ||i?2n|| = 0(b^) and n\\R 3n — ER 3n \\ — 0(b n ) almost surely, by the law of 
iterated logarithm for canonical U processes [1]. The term i?4„ can be bounded by 
II-R471 1 1 < II [S/s] — l|| 2 m^ 1 A„(r). But for a point r satisfying Condition 2.0(iii), we 
have A n (j) = A(r)+0{b n ) a.s. Therefore part (iv) below implies that ^/n||-R4n|| — > 
a.s. 

The term R\ n decomposes into the sum R\ n = R\ n ;i — Rin-2, where 

rt Ni(du) - Yi(u)A(du) 



l r 
RmAt,6) = -V / 



s(Tg(u- 



Rln;2{t, 



G(u, 9)C e (du) 



and G(t,0) = [S(r e (u-),6,u)- s(T e (u-),e,u)Y.(u)/EY(u)]. The Volterra identity 
(2.2) implies 

rtAt ' [1- A(Au)]Te(du) 



ncov(R ln . 1 {t,6),R ln;1 (t',9')) 



s(T e ,(u-),9>,u) ' 



ncov(R ln . tl (t,9),R ln;2 (t',6')) 

puAt' 

E[a(T g ,{v-),Z,e'\X =u,S= l]C g ,(dv)T 9 (du) 



JO 



nuAt' 
Ea{Y e ,{v-),Z, B'\X > u}}Cg,(dv)Tg(du), 



ncav{R ln . 2 {t,6),R ln , 2 {t' ,6')) 

ft pt' Au 

f(u,v,d,e')C e (du)C e >(dv) 



o Jo 



t' rt/\v 



+ 



f{v,u,9',6)C e (du)Cg,(dv) 



o Jo 

tAt' 



f(u,u,6,9')C e ,{Au)C e {du), 
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where f(u,v,6,6') = EY(u)cov(a(T e (u-), 0, Z), a{T e > (v-), 9', Z)\X > u). Using 
CLT and Cramer- Wold device, the finite dimensional distributions of y/nRi n (t,9) 
converge in distribution to finite dimensional distributions of a Gaussian process. 
The process R\ n can be represented as Ri n (t 7 9) — [P n — P]h t ,e, where H = 
{ht^(x,d, z) : t < t,9 S 0} is a class of functions such that each htfi is a lin- 
ear combination of 4 functions having a square integrable envelope and such that 
each is monotone with respect to t and Lipschitz continuous with respect to 9. This 
is a Euclidean class of functions [29] and {y/nRi n (t, 9) : 9 E Q,t < r} converges 
weakly in £°°([0, t] x 9) to a tight Gaussian process. The process ^/nRi n (t,9) is 
asymptotically equicontinuous with respect to the variance semimetric p. The func- 
tion p is continuous, except for discontinuity hyperplanes corresponding to a finite 
number of discontinuity points of EN. By the law of iterated logarithm [![, we also 
have ||.Ri n || = 0(b n ) a.s. 

Remark 5.1. Under Condition 2.2, we have the identity 
ncov(i?i„ ;2 (i, 9 Q ),R ln . 2 (t' , 9 )) 

2 

= ^ ncov(R ln . p (t, 9 , i?i„ ;3 _ p (t', 9 Q )) 
p=i 

- [ EY(u)var(a(T 9o (u~)\X >u)C eo (Au)Ce (du). 

J[0,t/\t'] 

Here 9q is the true parameter of the transformation model. Therefore, using the as- 
sumption of continuity of the EN function and adding up all terms, 
ncov(R ln (t,9 Q ),R ln (t',9 Q )) = ncov(R ln a(t, 9 ), R ln;1 (f ,0 O )) = Cg (t A t'). 

Next set b e (u) = h(T s (u-), 9, u), h = k' or h = h. Then J* b e (u)N.(du) = P n ft,8: 
where f t ,e = 1{X < t,S = l)h(T s (X A t~),9,X A r-). The conditions 2.1 and 
the inequalities (5.1) imply that the class of functions {ft.e ■ t < r, 9 £ 0} is 
Euclidean for a bounded envelope, for it forms a product of a VC-subgraph class 
and a class of Lipschitz continuous functions with a bounded envelope. The almost 
sure convergence of the terms R pn ,p — 5, 6 follows from Glivenko-Cantelli theorem 

Next, set bg(u) = k'(To(u—),9,u) for short. Using Fubini theorem and 
\Vg(u,w)\ < exp[f™ |&e(s)|i?iV((is)], we obtain 

i?9«(M) < / EN(du)\R 5n (t,9)-R 5n (^9)\ 
J(o,t) 

+ [ EN(du)\ [ V e (u,s-)bg(s)EN(ds)[R 5n {t,e)-R 5n (s,9)}\ 

J(0,t) J(u,t] 

< 2\\R 5n \\ [ EN{du)[l+ [ \V(u,w-)\\be(w)\EN(dw)} 

J[o,t) J{ u A 

< 2\\R 5n \\ [ EN{du)exp[[ \b e \(s)EN(ds)] -> a.s. 
uniformly in t < r, 9 £ O. 
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Further, we have RiQ n (t,9) = \Z^Sp=i RiOn,p{t,Q), where 

RiOn; P (t\8) = / R pn (u-,0)R 5n (du;6) = 
Jo 

t 

R P n(u-; e)k'(T e (u-),6, u)[N - EN](du). 

We have ||-\ALfi!iOn;p|| = 0(l)sup e t \y/nR pn (u— , 9)\ — > a.s. for p — 2,3,4. More- 
over, ^/nR Wn ;i(t; 9) = ^/nR Wn ,u(t; 9) + y/nR Wn ,i 2 (t; 9), where R Wn -ii is equal 
to 




while i?ion ; i2(i, 0) is the same sum taken over indices i = j. These are U-processes 
over Euclidean classes of functions with square integrable envelopes. By the law of 
iterated logarithm we have ||i?ion;ii|| — 0(b^) and n\\Ri 0n .i 2 — ER Wn .i 2 \\ = 
0(b n ) a.s. We also have ER 10n ;i2(t,9) = 0(l/n) uniformly in 9 € O, and t < r. 

The analysis of terms B\ n and B 2n is quite similar. Suppose that £'(x,9) ^ 0. 
We have B 2n = Ylp=i R"in;pi where in the term B 2n;p integration is with respect to 
R np . For p = 1, we obtain B 2n -i = B 2n -u + B 2n -i2, where 

B2n;ii{t,6) = \ f y"[Sl-eSi]Fo(u),6,u)R$(du,0), 

whereas the term B 2n -i 2 represents the same sum taken over indices i = j. These are 
U-processes over Euclidean classes of functions with square integrable envelopes. 
By the law of iterated logarithm [lj, we have ||-B2n;ii|| = 0(b^) and n\\B 2n -i 2 — 
EB 2n -i 2 \\ = 0(b n ) a.s. We also have EB 2n -i 2 (t,9) = 0(l/n) uniformly in 9 e 6, 
and t < t. Thus v^ll-^n;!! — > a.s. A similar analysis, leading to U-statistics of 
degree 1, 2, 3 can be applied to the integrals y/nB 2n - p (t } 9), p — 2,3. On the other 
hand, assumption 2.1 implies that for p — 4, we have the bound 

\B 2n . A (t,6)\ < 2 f ^{A 2 {u-)) {S ~ 2 S)2 EN{du) 
Jo s 

< 0(1) r { i^i E N(du), 

Jo s 

where, under Condition 2.1, the function ip bounding I' is cither a constant c or a 
bounded decreasing function (thus bounded by some c). The right-hand side can 
further be expanded to verify that || \fnB 2n -^\\ — > a.s. Alternatively, we can use 
part (iv). 

A similar expansion can also be applied to show that ||i?7n|| — > a.s. Alter- 
natively we have, \R 7n (t,9)\ < \K' n - k'\(T g (u-),9,u)N.(du) + \R 5n (t,9)\ and 
by part (iv), we have uniform almost sure convergence of the term R 7n . We also 
have \R\\n — R7 n \(t,9) < /J" 0(\T n g — Tg\)(u)N,(du) a.s., so that part (i) implies 
||-Riin|| — ► a.s. The terms Rs n and R\ 2n can be handled analogously. 

Next, [S/s}(r e (t-),e,t)=P n f8,u where 

fe,t(x,8,z) = l(x > t) \ t-r u \ a 7\ = 1 ( x - VdeA 2 )- 
EY{u)a(V o[t—), 9, Z) 

Suppose that Condition 2.1 is satisfied by a decreasing function ip and an increas- 
ing function i^x- The inequalities (5.1) and Condition 2.1, imply that |<70 t (.Z)| < 
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m 2 [ mi E P Y{T)]-\ \g , t {Z)-ge>,t{Z)\ < \9-6'\hi(T), \g e , t (Z) - g e ,t'(Z)\ < [P(X £ 
[t At',tV t')) + P(X £ (t At',tVtf],5= l)]h 2 (T), where 

hi(r) = 2m 2 [m 1 E P Y{T)}- 1 [MdoAp(Tj) + iP{0)d ie xp[d 2 A P (T)}, 
h 2 (r) = m 2 [m 1 E P Y(T)}- 2 [m 2 + 24>(0)}. 



Setting h(r) = max[/ii(r), h 2 (r), m 2 (miEpY(T))~ 1 ], it is easy to verify that the 
class of functions {fe,t(x, S, z)/h(r) : 9 £ 0, t < r} is Euclidean for a bounded enve- 
lope. The law of iterated logarithm for empirical processes over Euclidean classes of 
functions [1] implies therefore that part (hi) is satisfied by the process V = S/s— 1. 
For the remaining choices of the V processes the proof is analogous and follows 
from the Glivenko-Cantelli theorem for Euclidean classes of functions [1^] . □ 

6. Proof of Proposition 2.1 
6.1. Part (i) 

For PeP, let A(t) = A P (t) be given by (2.1) and let r = sup{t : E P Y{t) > 0}. 
The condition 2.1 (ii) assumes that there exist constants mi < m 2 such that the 
hazard rate a(x, 6\z) is bounded from below by mj and from above by m 2 . Put Ai — 
m 1 " 1 j4(<) and A 2 (t) = m 2 _1 (t). Then A 2 < A\. Further, Condition 2(iii) assumes 
that the function l(x, 9, z) = loga(x, 9, z) has a derivative £'(x, 9, z) with respect to 
x satisfying \£'(x, 9,z)\ < ip(x) for some bounded decreasing function. Suppose that 
i/j < c and define p(t) = max(c, l)Ai(t). Finally, the derivative £(x,9,z) satisfies 
\£(x, 9,z)\ < ipi{x) for some bounded function or a function that is continuous 
strictly increasing, bounded at origin and satisfying J ipi(x) 2 e~ x dx < oo. Let 



In the inequalities (5.1) of Lemma 5.1 we take do — m\ , d\ — max(l, c) and d 2 = d. 

Let T = [0, ro] if ro is a discontinuity point of the survival function EpY(t), and 
let T = [0, ro), if to is a continuity point of this survival function. Consider the 
set of functions X(P) — {g : g monotone increasing, e~ 9 £ D(T),g -C EpN,A 2 < 
g < Ai}. Since for each g £ X(V), the function e~ 9 is a subsurvival function 
satisfying exp[— A\] < exp[— g] < exp[—A 2 ], we can consider X(P) as a subset of 
D(T), endowed with supremum norm. Next, for r < t , let X(P,t) C D([0, r]) 
consist of functions g £ X{P) restricted to the interval [0, r]. For fixed 9 £ 9 and 
g £ X(P, r), define 



Using bounds A\ < g < A 2 it is easy to verify that for fixed 9 £ Q, maps 
X(P,t) into itself. Since A p (0-) = 0, we have ^(0-) = and * e (g)(0-) = as 



Consider the equation ^e(g) — g,g(0 ) = 0. Using Helly selection theorem, it 
is easy to verify that for fixed 9 £ 0, the operator tyg maps X(P,t) into itself, 
is continuous (with respect to g) and has compact range. Since X(P,t) forms a 
bounded, closed convex set of functions, Schauder's fixed point theorem implies 
that \Pe has a fixed point in X(P,t). 





well. 
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To show uniqueness of the solution and its continuity with respect to 9, we 
consider first the case of continuous EN(t) function. Then X(P 1 t) C C([0, r]). 
Define a norm in C([0,r]) by setting \\x\\ T p = sup 4<T e~ p ^\x(t)\. Then || • ||£ is 
equivalent to the sup norm in C([0, r]). For g, g' e X(P, r) and 9 e 0, we have 

\^e(g)-^e(g')\(t) < [ \g - g'Ku^A^A^du) 
Jo 

< [ |.9 - g'\(u)p(du) <\\g- g% [ e"^p(du) 
Jo Jo 

< \\g-g%e p{t) (l-e-^) 

and hence \\9„{g) - ^e'(g')\\ T P < 11.9 - 9%^ ~ ^ p(r) )- For any g e X(P,t) and 
9,9' e 0, we also have 

l*o(s)-*9'(s)l(t) < le-fl'l / Mg{u))Ai{du) 

Jo 

< \o-e'\ [ Mp{u)) P (du) 

Jo 

< \6-9'\e p W [ Vi(p(«))e- p(u) p(d«) < |0 - 0'|e p(t) d, 

Jo 

so that ||* e (.9) - < |0 - 9'\d. It follows that {* e : 6» e 0}, restricted to 

C[0, r]), forms a family of continuously contracting mappings. Banach fixed point 
theorem for continuously contracting mappings [24] implies therefore that there 
exists a unique solution Te to the equation $$(g)(t) = g(t) for t < r, and this 
solution is continuous in 9. Since A(0) = A(0— ) = 0, and the solution is bounded 
between two multiples of A(t), we also have Tg(0) = 0. 

Because || • ||£ is equivalent to the supremum norm in C[0,r], we have that for 
fixed r < to, there exists a unique (in sup norm) solution to the equation, and 
the solution is continuous with respect to 9. It remains to consider the behaviour 
of these functions at t . Fix 9 e again. If A(to) < oo, then Tg is unique on 
the whole interval [0, To] (the preceding argument can be applied to the interval 
[0, To]). So let us consider the case of A(t) | oo as t ] t . If < < t , 
then X(P,t^) C X(P,t^). Let T ( g p) E X(P,r^),p = 1,2 be the solutions ob- 
tained on intervals [0, t' 1 '] and [0, t^], respectively. Then the function satisfies 
T g 2 \t) = T { g\t) for t € [0,tW]. If T t , then the inequalities exp[-Ai(r("))] < 
exp[-r^ n) (T( n ))] < exp[-A 2 (r( n ))] imply r^rW) | oo. Since this holds for any 
such sequence r (n \ there exists a unique locally bounded solution to the equation 
on the interval T = [0, To). 

Next let us consider the case of discrete EN(t) with a finite number of discon- 
tinuity points. In this case Ap(r n ) is bounded and satisfies A p (0— ) = 0. Fix 9. 
Using induction on jumps, it is easy to verify that for any g e X(P, to), we have 
^e(g) € X(P, to), and for any g, g' S A^P, To), we also have ^e(g) = ^e(g'). Hence 
^e(g) — ^e(g)- Alternatively, that the solution Tg to the equation ^g(g) = g, 
g(0— ) = is uniquely defined follows also from the recurrent formula T$(t) = 
T e (t-) + EN (At) [EY(t)a(Tg (i— ) , 9, Z)]~\ r,(0-) = 0. 
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For any g e X(P, r ) and 0, 0' e 9, we also have 

l*o(ff)-*«'0/)l(*-) < \o-0'\ [ Mg(^-))Mdu) 

J[0,t) 

< \e-6'\e p ^ ( Mp(u-))e~ p{u) p(du) 

J[o,t) 

< \6-9'\e p ^-U. 
To see the last inequality, we define 

= f e- y i)i{y)dy. 
Jo 

Then *i(p(t))-*i(0) = T,p(Au)ipi(p(u*)) exp-p(u*), where the sum extends over 
discontinuity points less than t, and p(u*) is between p(u— ) and p(it). The right- 
hand side is bounded from below by the corresponding sum 
^2 p(Au)ipi(p(u— )) exp[— because V'iOe) is increasing and exp(— x) is decreas- 
ing. Since ^g(g) = Tg for any 9, we have sup t < To e -p(*-)|r - r^|(i— ) < |0 - 0'|d. 
Finally for both the continuous and discrete case, we have 

|r 9 -IV|(i) < \^e(Te)-^e(Te')\(t) + \^e(re>)-^e'(Te')\(t) < 

< f \T -T e ,\(u-)p(du) + \e-e'\ [ Mp(u-)) P (du), 
Jo Jo 

and Gronwall's inequality (Section 9) yields 

|r 9 - r„,\(t) <\6- 0V W [ Vi(p(w-))e" p( " _) p(du) < d|0 - 0'|e p(t) - 

J(0,t] 

Hence sup t<r e~ p (*) \Tg — IV|(t) < \9 — 9'\d. In the continuous case this holds for 
any r < To, in the discrete case for any r < t . 

Remark 6.1. We have chosen the p function as equal to p(t) — max(c, 1)A\, where 
c is a constant bounding the function l\(x, 9). Under Condition 2.1, this function 
may also be bounded by a continuous decreasing function ip. The proof, assuming 
that p{t) = J ^{A 2 (u-))A 1 (du) is quite similar. In the foregoing we consider 
the simpler choice, because in Proposition 2.2 we have assumed Condition 2.0(iii). 
Further, in the discrete case the assumption that the number of discontinuity points 
is finite is not needed but the derivations are longer. 

To show consistency of the estimate T n g, we assume now that the point r satis- 
fies Condition 2.0(iii). Let A n (t) be the Aalen-Nelson estimator and set A pn = 
nip ^ A n , p = 1,2. We have A 2n (t) < T n9 (t) < A ln (t) for all 6 e 6 and t < 
m&x(Xi, i = 1, . . . , n). Setting K n (Y n g{u— ), 9, u) = S(T n g(u—),6, u)^ 1 , we have 

T n g(t)-Tg(t) = R n {t,9) 

+ I [K n {T n6 (u-),e,u) - K n {T e (u-),e,u)]N.{du). 
J(o,t] 

Hence \T n g(t) - Tg(t)\ < \R n (t,9)\ + J*\T ne - T e \(u-)p n {du), where p n = 
max(c, l)A\ n . Gronwall's inequality implies sup t e exp[— p n (t)]\T n $ — Tg\(t) — > 
a.s., where the supremum is over 9 € Q and t < r. If t is a discontinuity point of 
the survival function EpY{t) then this holds for r = t . 
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Next suppose that tq is a continuity point of this survival function, and let 
T= [0, r ). We have sup tcT | exp[— A pn (t) — exp[—A p (t)\ = op(l). In addition, for 
any r < r , we have exp[— Ai„(t)] < exp[— r n g(r)] < exp[— J 4 2n (r)]. Standard 
monotonicity arguments imply sup tgr | exp[— T n g(t) — exp[— Tg(t)\ = op(l), because 
r§(r) j oo as t | oo. 



6.2. Part (iii) 

The process W(t, 9) = ^/n[T n g — T s ](t) satisfies 

W(t,6) = VER n (t,6) - [ W(u-,d)N.(du)b* ne (i 
J[o,t] 

where 



[ (S'/S 2 ) (6,T e (u-) + X[T ne - T e ](u-),u)dX 
Jo 



Define 



W(t, 9) = V^R n {t, 0) - [ W(u-, e)b {u)EN{du), 
Jo 

where bg(u) = [s' / s 2 ](Tg(u), 9, u). We have 

W{t,6) = y/nRn(t,6) - f VnR n (u-,9)bg(u)EN(du)V e (u,t) 
Jo 



and 



W(t, 6) - W{t, 9) = - f [W -W]{u-,9)b* n6 (u)N.{du) +rem(t,6>), 
Jo 

where 

rem(t, 9) = - [ W{u-, 9)[b* nd {u)N (du) - b e {u)EN{du)\. 
J[o,t] 

The remainder term is bounded by 

[ \W{u-,9)\\ [b* n0 - b e ] (u) | N. (du) + R Wn (t, 9) 
Jo 

+ I \V^Rn(u-,9)\\bg(u)\R 9n (du,9). 
Jo 

By noting that Rg n (-,9) is a nonnegative increasing process, we have ||rcm|| = 
op(l) + Pio„|| + P (l)\\R 9n \\ = o P (l). Finally, 

\W(t,9)-W(t,9)\<\rem(t,9)\+ f \W - W\(u-,9)p n (du). 

Jo 

By Gronwall's inequality (Section 9), we have W(t, 9) = W(t, 9) + op(l) uniformly 
in t < r, 9 e Q. This verifies that the process y/n[T n g — Tg] is asymptotically 
Gaussian, under the assumption that observations are iid, but Condition 2.2 does 
not necessarily hold. 
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6.3. Part (ii) 

Put 

(6.1) t ne {t) = / k n (T ne (u-),e,u)N(du) 

Jo 

+ f K' n (T n o(u-),6,u)t ne (u-)N(du), 
Jo 

(6.2) T (t) = I k(T (u-),6,u)EN(du) 

Jo 

+ f k'(T e (u-),e,u)t (u-)EN(du). 
Jo 

Here K = S/S 2 , K' = -S'/S 2 , k = s/s 2 and k' = -s'/s 2 . Assumption 2.0(iii) 
implies that T s (t) < (t) < oo. For G = Conditions 2.1 imply that 

supg ^/p \G(Tg(u— ), 9, u)\EN{du) < oo so that sup e ||r6i||oo < °o. Uniform consis- 
tency of the T n Q process implies also that for G n — K' n7 K n , we have 

limsupsup / \G n (Y n g(u— ), 9 1 u)\N, (du) < oo 
n e.t Jo 

almost surely. Substracting equation (3.2) from (3.3), we get 

[t ne -tg}(t) = y n (6,t)+ f [t n g-t ](u-)K' n (Tg(u-),6,u)N(du), 

Jo 

where 

*nM) = Rl2n(t,0)+ [ tg(u-)Run(du,6). 

Jo 

By Lemma 5.1 and Fubini theorem, we have \\^ n \\ — ► a.s. And using GronwalPs 
inequality (Section 9), \T n g — Tg\(t) — > a.s. uniformly in t and 9. 

Further, consider the remainder term rem„(/i, 9, t) — T n g + h 
for 9, 9 + h e 6. Set h 2n = ^nfi+h - T n ,0- We have 



vem n (h,9,t) = h 1 I ip2n(h,9,u)N,(du) 
Jo 

+ / rem n (h,9,u-)ip ln (h,9,u)NXdu), 
Jo 



i> ln (h,9,u)= [ K' n (T n g(u-) + \h 2 n(u-),6 + \h,u)d\, 
Jo 

ll)2n(h,0,ll) 

= / \K n (T n g(u-) + Xh 2 „{u-),e + Xh,u)- k n (T n g(u-),6,u) <l\ 
Jo 1 

+ / [K' n (T n g(u-) + Xh 2 n(u-),e+Xh,u)-K' n (T n g(u-),e,t 

Jo 

X t n g(u-)dX. 
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We have /„* \^ ln (h,9,u)\N.(du) < p n (t) and /„* \Tp 2n (h,9,u)\N(du) < h T f* B n (u)x 
N.(du), for a process B n with limsup n JJ" B n (u)N.(du) — 0(1) a.s. This follows 
from condition 2.1 and some elementary algebra. By Gronwall's inequality, 
limsup n sup t<r |rem„(/i, 6,t)\ = 0(\h\ 2 ) = o(\h\) a.s. A similar argument shows that 
if h n is a nonrandom sequence with h n — 0(n -1 / 2 ), then limsup n sup t<r \iem n (h n , 
9,t)\ — 0(n~ 1 ) a.s. If h n is a random sequence with \h n \ -5- 0, then 
limsup„sup t < r \revci n (h n ,e,t)\ = O p (\h n \ 2 ). 



6.4. Part (iv) 

Next suppose that 9q is a fixed point in 0, EN(t) is continuous, and 9 is a i/n- 
consistent estimate of 6*o- Since EN(t) is a continuous function, {W^(i, 9) : t < t,9 £ 
0} converges weakly to a process W whose paths can be taken to be continuous 
with respect to the supremum norm. Because i/n[9 — 9q] is bounded in probability, 
we have V™K§ - ^e ] - y/n[9 - 0o]f e = W(; §)+ ^[T § - Tg -[6- 9 }tg ] = 
W(-,6) + Op(y/n\9-9 \ 2 ) W(-,9 ) by weak convergence of the process {W(t, 9) : 
t<T,9£<d] and @. 



7. Proof of Proposition 2.2 

The first part follows from Remark 3.1 and part (iv) of Proposition 2.1. Note that at 
the true parameter value 9 — 9q, we have ^/n[T n g Q — Tg ](t) = n 1 / 2 J Q Ri n (du, 9$) x 
Vg (u,t) + op(l), where R\ n is defined as in Lemma 5.1, 



1 - r 
R ln (t,6) = -S2 / 



Mi(du,9) 



—[Jo s(T e (u-),9,u)' 



and Mi(t, 9) = N t (t) - J* Yi(u)a{Tg,e, Z t )r e (du). 
We shall consider now the score process. Define 



n . T 

m{0) = ~Y / b l (Te(u) 7 9)M l (dt,t 
n 7=1 Jo 



U, 



U n2 {9) = / Rx n {du,B) / V e (u,v-)r(dv,6). 

JO J(u,t] 

Here bi(T e (u),9) = b tl (Te(u),6) - b i2 (T e (u),9)tp Bo (t) and b u (T e (t),9) = 
i(T e (t),9,Z t ) - [s/s](T e (t),9,t), b 2i (T e {t),9) = £'(T e (t),9, Z t ) ~ [s' /s](T e (t),9,t). 
The function r(-, 9) is the limit in probability of the term f\{t, 9) given below. Under 
Condition 2.2, it reduces at 9 = 9$ to 

r(-,0 o ) = - / P v (^°o)EN(du) 
Jo 

and p v (u,9o) is the conditional correlation defined in Section 2.3. The terms 
\fnUi n {6a) and \/nU 2n (9o) are uncorrelated sums of iid mean zero variables and 
their sum converges weakly to a mean zero normal variable with covariance matrix 
"^2,ip (9o j t) given in the statement of Proposition 2.2. 
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We decompose the process U n {9) as U n (8) — U n {6) + U n (9), where 

i " r 

U n (0) = -V / [h{T ne {t),0,t)-b 2i {T ne {t),6,t)ipe o {t)}Ni{dt), 
n *=i J o 

Un(0) = — V / b2i(T n e(tlO,t)[<p ne - <pe }(t)]Ni{dt). 

We have U n {9) — J2j=i U n j(9), where 

U nl (0) = U nl (6) + B nl (T,6)- f <pe (u)Bn2(du,6), 

Jo 

U n2 (9) = I [Tne-TemhidtJ), 
Jo 

U n3 (0) = ! [T n g-Tg}{t)f 2 (dt,e). 



As in Section 2.4, bu(x, 9, t) = £(x, 9, Z t ) - [S/S](x, 9, t) and b 2i (x, 9, t) = £'(x, 9, Z t ) 
— [S'/S)(x, 9, t). If b p i and b' pi are the derivatives of these functions with respect to 
9 and x, then 

1 n f s 

r 2 (s,9) = / f 2i {t,e,X)dXNi{dt), 



n 

i=l 



JO 



r 2l (t,9,X) = [b' u {Tg(t) + X(T ne ~r e )(t),9,t)-b' u (Tg(t),9,t)} 

- [b' 2l (Tg(t) + X(T n g - Tg)(t),9t) - b' 2l (Tg(t),9, t)] V g (t). 

We also have U n (9) = U ni (9) + U n5 (9), where 



U ni (6) = - / [ip n g-tpg ](t)B n (dt,e), 
Jo 

i n r 

U n5 {9) = -V / [v n g-tpe }(t)[b 2 i(T n e(u),6,u)-b 2 i(Tg(u),6,u)}Ni(dt), 

n — /n 

i=l 

1 " f* 

B n {t,9) = -V / 6ai(r e («),fl,u)JVi(d«) 
I n rt 

= - V / 6w(r e (u),M)Mi(dM) + Ba»M). 



i=l 



We first show that U n (9o) = op(n -1 / 2 ). By Lemma 5.1, y/nB 2n (t,9 ) converges 
in probability to 0, uniformly in t. At 9 = 9q, the first term multiplied by ^/n con- 
verges weakly to a mean zero Gaussian martingale . We have 1 1 ^ n e — 1 1 oo = op ( 1 ) , 
\\<pg \\v < oo and limsup ra ||</3ne ll« < 00 ■ Integration by parts, Skorohod-Dudlcy 
construction and arguments similar to Lemma A. 3 in 7], show that y/nU n i{9o) = 
o P (l). We also have y / nU n5 (9 ) = J Q r O P (^/n\T n g - Tg \(t)\ip n g - (p \(t)Ni(dt) = 
o P (l). 

We consider now the term U n (9 ). We have y/nU n ^(9 ) — y/n L Op(\T n g — 
Tg \(t) 2 )N,(dt) — op(l). We also have ||r(-, 9 )\\ v < oo, limsup„ ||f(-, 6o)\\ v < °° and 
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|| [fi — r](-,0o)||oo = op(l), so that the same integration by parts argument implies 
that ^U n2 (9 ) = s/ku n2 (9o) + op(l). Finally, <ftfJ nl {9 Q ) = VnU nl (9 ) + o P (l), 
by Lemma 5.1 and Fubini theorem. 

Suppose now that 9 varies over a ball B(9q, e n ) centered at 9q and having radius 
£n,£n I 0,\/n£n — ► oo- It is easy to verify that for 9,9' € B(9o,e n ) we have 
U n {9') - U n {9) = -{9' - 9) T Z ln (9 ) + {9' - 9) T R n {9,9'), where R„(9,9') is a 
remainder term satisfying sup{|i? n (0, 9')\ : 9,9' e B(9o,e n )} = op(l). The matrix 
Yi\n{9) is equal to the sum £i„(#) = £n n (#) + £i2„(6>), 



1 n r 

n i=l J o 



^12n(9) 



— 5Z / [fi~ S f /S](T ne (u),e,u)N i (du), 
n i=l J o 



where S f (T ng (u),9,u) = n 1 £™ =1 y;(u)[a i / l ](r„ e (u), 0, u) and 

5ii(0,r ne (u),u) = &ii(r„ e (u),6») - &2i(r„e(w),6')^eo( u ): 
52j(^,r„ e (M),u) = b u (T ne (u),9) + b 2i (T ng (u),9)t ne (u) 

fi{9,T n8 {u),u) = - (T n8 (u),9,Zi) - —(T ng (u),9,Z i )ip0 o (u) 1 
a a 

+ t ne {u)[—{T ne {u),9,Z l )] T 
a 

+ —(rn9(u),9,Zi)t ng (u)ip go (u) T . 

a 



These matrices satisfy £ii n (0o) — *p ^i,ip(@o, T ) and £i2n(#o) — >p 0, and 
^i,ip{0o, t) — Si(6*o) is defined in the statement of Proposition 2.2. By assumption 
this matrix is non-singular. Finally, set h n (9) = 9 + Y,i(9o)~ 1 U n (9). It is easy to 
verify that this mapping forms a contraction on the set {9 : 1 6* — | < A n / (1 — a n )}, 
where A n = \^ 1 (9 a )- 1 U n (9 )\ = P { n - x ' 2 ) and a n = sup{|/ - £i(0 o ) _1 £in(0o) + 
Ei(0o) -1 i^(^0')l : M' e S(6> ,e„)} = o P (l). The argument is similar to Bickcl 
et al. ([6], p. 518), though note that we cannot apply their mean value theorem 
arguments. 

Next consider Condition 2.3(v.2). In this case we have U n (9') — U n {9) = —(9' — 
0) T £i„(0 o ) + (9' - 9) T R n {9,9'), where sup{\R n (9, 9') :_9,9' € B(9 Q ,e n )}_= o P (l). 
In addition, for 9 £ B n (9 ,e), we have the expansion U n {9) = \U n {9) — U n (0o)] + 
U n {9o) = op(\9 — 0q\ + nT 1 / 2 ). The same argument as above shows that the equa- 
tion U n (9) has, with probability tending to 1, a unique root in the ball B(9a,e n ). 
But then, we also have U n (6 n ) = U n {9 n ) + U n (9 n ) = o P {\9 n - 9 \ + -nT 1 ! 2 ) = 
opiOpin- 1 ' 2 ) + n- 1 ' 2 ) = o^n- 1 ' 2 ). 

Part (iv) can be verified analogously, i.e. it amounts to showing that if y^n[9 — 9q] 
is bounded in probability, then the remainder term R n {9, 9o) is of order op (\9 — 9o\), 
and U n {9) = o P (\9 - 9 \ + n" 1 / 2 ). 
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8. Proof of Proposition 2.3 

Part (i) is verified at the end of the proof. To show part (ii), define 

( 1 \ m 

^— ' to! 

rn>0 



/ 1 \m 

D(t,u,X) = y2^J-\ m D m (t,u). 



ml 

m>0 



The numbers d m and the functions D m (t, u) are given by d m — 1, D m (t, u) — k{t, u) 
for m = 0. For m > 1 set 

d m =/•■•/ det J m (s)6(dsi) • . . . ■ b(ds m ), 



D m (t,u) = /■■•/ det D m (t,u;s)b(dsi) ■ ...-b(ds m ), 

where for any s = (si, . . . , s m ), d m (s) is an m x m matrix with entries d m (s) = 
[fc(si, Sj)], and D m (t, u; s) is an (m + 1) x (to + 1) matrix 



D m (t,u]s) = 



k(t,u), U m (t;s) 
V m (s;u), d m (s) 



where U m (t;s) = [k(t, s x ), . . . , fc(ijjm)], Kn(s;ii) = [fc(si, u), . . . , fe(s m , u)] T . 
By Fredholm determinant formula [251 ] . the resolvent of the kernel k is given by 
A(i, u, A) = u, A)/.D(A), for all A such that D(X) ^ 0, so that 

dm = J ... [ ] det J TO (s)fe(dsi) • . . . • b(ds m ), 



distinct 



because the determinant is zero whenever two or more points Sj, z = 1, . . . , m are 
equal. By Fubini theorem, the right-hand side of the above expression is equal to 

^ / •■■ / det d m ( )6(dsi) • . . . • 6(ds m ) 

T J J0<S 1 <S 2 <-<S m <T 

det J m (si, . . . , s m )b(dsx) ■ b(ds m ). 

0<S 1 <S 2 <---<S m <T 

The first sum extends over the to! possible permutations 7r = (tt(1), . . . , 7r(m)) of 
the index set {1, . . . , to}. The second line follows by noting that 

det d m (%(!), • ■ • , = det dm(sii • • • ) s m) 

for any such permutation, because the matrix d m (s ff (i), . . . , s ff ( m )) is symmetric 
and to rearrange it into the matrix d m (si, . . . , s m ), we need the same number of 
transpositions of rows and columns. Since the total number of such transpositions is 
even, the determinants have the same sign. In the same way, the function D m (i, it) 
is equal to 

det D m (t, u; si, . . . , s m )b(dsi) ■ b(ds m ), 

0<S 1 <S 2 <---<S m <T 



164 



D. M. Dabrowska 



so that in both cases it is enough to consider the determinants for ordered sequences 
s = (si, . . . , s m ), si < s 2 < . . . < s m of points in (0, r] m . 

For any such sequence s, the matrix d m (s) has a simple pattern: 



d m (s) 



( c(si) c(si) c(si) ... c(si) \ 
c(si) c(s 2 ) c(s 2 ) ••• c(s 2 ) 
c(si) c(s 2 ) c(s 3 ) ... c(s 3 ) 



\ c(si) c(s 2 ) c(s 3 ) . . . c(s m ) J 
We have J m (s) = A^ l C m (s)A m where C m (s) is a diagonal matrix of increments 

C m (s) = diag [c(si) - c(s ), c(s 2 ) - c(si), . . . c(s m ) - c(s m _i)], 
(c(sq) = 0, so = 0) and A m is an upper triangular matrix 



/ 1 1 ... 1 1 \ 
1 ... 1 1 



Am, — 



... 1 1 

V o o ... o i / 

To see this it is enough to note that Brownian motion forms a process with inde- 
pendent increments, and the kernel k(s,t) = c(s A t) is the covariance function of a 
time transformed Brownian motion. 
Apparently, det A m = 1. Therefore 

m 

det J m (s) = JJ[c(sj) - c(sj_i)] 

and 

det D m (t, u;s) = det d m (s)[c(t An) - U m (t; s)[d m (s)]~ 1 V m (s; u)] 

= det d m (s)[c(t Au) -U m (t;s)A m 1 C m 1 (s)(A^)- 1 V m (s;u)}. 

The inverse A" 1 is given by Jordan matrix 



/ 1 -1 
1 -1 



A- 1 



\ 







\ o o 



1 -1 

1 / 



and a straightforward multiplication yields 

m 

det D m (t, u; s) = c(i A m) JJ[c(sj) — c(sj_i)] 



^[cft A Si)-c(t A Sj_i)][c(u A Sj)-c(u A Sj_i) 



x n [ c (*j)-c(*j-i)]. 
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By noting that the i-th summand is zero whenever tAu< Sj_i and using induction 
on m, it is easy to verify that for t < u the determinant reduces to the sum 

det D m (t, u;s) = l(t < u < si)c(t)[c(si) - c(u)} [c(sj) - c(sj-i)] 

j=2 

m 

+ l(s m < t < u) Y[i c ( s j) - c ( s i-i)][c(*) - c(s m )] 
3=1 

771— 1 si «. 

+ ]T <t<ii< S4+ i)(j|[ c (^)- c ( s ^i)][ c W-c(s 4 )]) 
i=l \j=l ' 

x ( [c(si+i) - c{u)\ Y[ [c^O-c^-i)]), 



j=i+2 



where in the last sum, product over an empty set of indices is interpreted as equal 
to 1. Thus we have a simple expression for the two determinants. Integration with 
respect to the product measure b(dsi) . . . b(ds m ) and induction on m yields also 



—An 
m! 



*0m(0,r), 

— D m (t,u) = V*i,(0,tA«)$o,ra_,(tVU,T), 

ml 



1=0 



for m > 0. The numerator and denominator of the Fredholm determinant formula 
are bounded functions for any point r satisfying the condition 2.0 (hi). For A = 
— 1, the ratio A(i, u, A) = D(t,u, A)/ D(X) reduces to the function A given in the 
statement of part (ii). Using monotonicity of the functions V&o with respect to 
the length of the interval (s,t], we also have A(t, u) < ^o(0, r)c(t A u). If t is a 
continuity point of EY(t) and k(t ) < oo, then the denominator is bounded, and 
the inequality is satisfied for any u,t < tq. 

Parts (iii) and (iv) are easy to verify using this last observation. For example, if 
k(t ) < oo then for any fj € L 2 (b), the Fredholm equation has a unique solution -0 
and 



2 < Wvh + 



/•To 

I A(t,u)b(du)fj(u) 
Jo 



-1 1/2 



b(dt) 



By Cauchy-Schwarz inequality and monotone convergence, the second term is 
bounded by 



\m\2 



A 2 (t,u)b(du)b(dt) 



-1 1/2 



JO 

< ||»7||2*o(0,7fc) 

< 11^112*0(0, To) 



c(t A u)b(du)b{dt) 



JO 

/•To l>T 

Jo Jo 



c{t)c{u)b{du)b(dt) 



1/2 



1/2 



12*0(0, to)k(to). 



Part (v) follows from part (iv) and Lemma 2.1. Part (vi) can be verified using 
straightforward but laborious algebra. 
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Part (i). For u > s, set c((s,u]) = c(u) — c(s). The n-the term of the series 
^On{s, t) is given by the multiple integral 



c((s, si])b(dsi)c((si, s 2 })b(ds 2 ) ■ ■ ■ c((s„_i, s n ])b(ds n ), 

<s 1 <---<s n <t 



and satisfies * n(s,t) < («0 ^/(s,*)]™, ^(M) = / s * c((s, u])b(u). The integral 
I(s,t) is increasing with the width of the interval (s,t\ and is bounded by k(t). 
Thus ^o(s, t) < exp k(t) < oo for all < s < t < t. If in addition k(t ) < oo, then 
&o(s,t) is bounded for all < s < t < t . In both circumstances, this implies that 
the remaining interval functions \&j(s,i), < s < t < r are finite for any point 
r satisfying Condition 2.0 (iii), and monotonically increasing with the size of the 
interval. 

While the identities can be verified by applying Fubini to each term of the 
= 0, 1,2,3 series, the following provides an interpretation in terms of linear 
Volterra equations. First, it is easy to see that the "odd" functions satisfy 





= c((s,t}) + 


/ c((s,ti])6(du)*i(u,t) 

l(s,t] 




= c((s,t}) + 


/ $i(s,u)b(du)c((u,t\) 

l(s,t] 


*3(«,*) 


= K[s,t)) + 


/ b([s,u))c(du)$ 3 (u,t) 

l[s,t) 




= K[s,t)) + 


/ $ 3 (s,u)c(du)b([u,t)) 

l[s,t) 



so that they form resolvents of linear Volterra equations. The "even" functions V&o 
and ^2 satisfy such equations 



#o(s,i) = 1+ / c((s,u])b{du)^ (u,t) 

J(s,t] 

= 1+ / V (s,u-)c{du)b([u,t]), 

J{s,t] 

* 2 (s,t) = 1+ / 6([s,u))c(du)* 2 (u,t) 

= 1+ / $2(s,«+)K*)c((«,i)). 

•'[».*) 

With fixed t, the equations 

hi(s,t)- / c((s,ti])6(du)fti(u,i) = gi(s,t), 

J(s,t] 

h 3 (s,t)- / 6([s,u))c(du)ft 3 (ti,i) = g 3 (s,t), 

J(s.t] 



l(s,t] 

I 

'(».*] 

have unique solutions 



/ii(s,t) = gi(s,t)+ / *i(s,u)6(du)5i(u,t), 
J( s ,t] 

M s >*) = g 3 (M)+ / ^ 3 (s,u)c(du)g 3 (u,t). 

Jls.t) 
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The first pair of equations for <3/o and ^ in part (i) follows by setting gi(s,t) = 
1 = 53(5, t). With s fixed, the equations 

hi(s,t)- / fti(s,tt+)6(du)ci((u,i)) = gi(s,t), 

J[s,t) 

hs(s,t) - / /i 3 (s,-u-)c(dM)6 3 ([u,t]) = g 3 (s,t), 

J(s,t] 

have solutions 

MM) = <?i(M) + / 

h(s,t) = <? 3 (M) + / 5 3 (s,n-)c(du)* 3 (w,t+). 

The second pair of equations for \fo and Vf^ in part (i) follows by setting g~i(s, t) = 
1 = g 3 (s,i). Next, the "odd" functions can be represented in terms of "even" 
functions using Fubini. 

9. Gronwall's inequalities 

Following Gill and Johansen [lij], recall that if & is a cadlag function of bounded 
variation, ||&||„ < r\ then the associated product integral V(s,t) = TT^tjU + K^")) 
satisfies the bound |P(s,t)| < 7T( s t](l + < ex P ll&||f( s J t] uniformly in 

< s < t < t. Moreover, the functions s — > ^(s, t),s <t < r and t — > 7 5 (s, t),t £ 
(s, t] are of bounded variation with variation norm bounded by r\e ri . 

The proofs use the following consequence of Gronwall's inequalities in Beesack 
Q and Gill and Johansen [l8j]. If & is a nonnegative measure and y € D([0,t]) is a 
nonnegative function then for any x G D([0,t]) satisfying 

0<x(t)<y(t)+ x(u-)b(du), fe[0,r], 
•Ao,t] 



we have 



0<x(t)<y(t)+ y(u-)b(du)V(u,t), t e [0,t]. 
J(o,t] 



Pointwise in t, \x{t)\ is bounded by 

maxjUylloo, ||y _ ||oo}[l + / b(du)V(u,t)} < {||y||oo, ll2T||oo}exp[ / fe(d-u)]. 

J(o,t] Jo 

We also have lie I 1 1 1 00 ^ ma,x{ || y 1 1 00 j 1 1 2/ 1 1 00 

}. Further, if ^ y e D([0, r]) and b 
is a function of bounded variation then the solution to the linear Volterra equation 

x(t) = y(t) + [ x(u-)b(du) 

is unique and given by 



" 



x(t)=y{t)+ y(u-)b(du)V(u,t). 
J(o,t] 
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We have \x{t)\ < max{||y|| 00 , ||y~||oo} exp J Q * d\\b\\ v and || exp[- /_ d||6||„]|a;| ||oo < 
maxlllyljoo, || y || 00} • If Ve{t), and bg(t) — f Q ke(u)n(du) are functions dependent 
on a Euclidean parameter 6 £ C R d , and |&g|(i) < k(t), then these bounds hold 
pointwise in 6 and 

sup{exp[- / k(u)n(du)]\xg(t)\} < max{sup \ye\(u), sup \ye(u-)\}. 

t<r Jq u<x u<t 

eee eee eee 
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